Groq calls its architecture the Tensor Streaming Processor (TSP). Two years ago, it said it had hired eight of the ten people who were developing Google's Tensor Processing Unit (TPU).
The company raised $62.3 million.
The Groq architecture is equivalent. up to one quadrillion operations per second, or 1e15 ops/s and capable of up to 250 trillion floating-point operations per second (FLOPS).
“Leading GPU companies have told customers that they expect to deliver PetaOp/s performance over the next few years; Groq is announcing today, says Groq CEO Jonathan Ross, “Groq's architecture is many times faster than anything else available for inference, both in terms of low latency and inferences per second. We had first silicon, enablement on day one, applications launched within week one, samples to partners and customers in less than six weeks, with A0 silicon going into production.”
Groq TSP's software-centric architecture claims to deliver computational flexibility and massive parallelism without the synchronization overhead of traditional GPU and CPU architectures.
Groq's architecture can support both traditional and emerging machine learning models, and is currently running at customer sites on both x86 and non-x86 systems.
The architecture is designed specifically for the performance requirements of computer vision, machine learning, and other AI-related workloads.
<p>Execution scheduling occurs in software, freeing up silicon real estate that is otherwise dedicated to dynamically executing instructions.
The tight control provided by this architecture enables deterministic processing that is especially valuable for applications where safety and accuracy are paramount.
Compared to complex traditional architectures based on CPUs, GPUs and FPGAs, the Groq chip also optimizes qualification and deployment, allowing customers to simplify, easily and quickly implement scalable systems with high performance per watt.
Source: electronicsweekly.com