Measure Intelligence by Speed


A metric to track the exponential growth of AI





Haifeng Jin

Interview

100%

50%

100%

Is this fair?

Benchmark Gemini 2.5 Pro OpenAI o3 OpenAI o4-mini ...
Humanity's Last Exam 21.6% 20.3% 14.3% ...
GPQA (single) 86.4% 83.3% 81.4% ...
GPQA (multiple) ...
AIME (single) 88.0% 88.9% 92.7% ...
AIME (multiple) ...
LiveCodeBench 69.0% 72.0% 75.8% ...
Aider Polyglot 82.2% 79.6% 72.0% ...
SWE-bench (single) 59.6% 69.1% 68.1% ...
SWE-bench (multiple) 67.2% ...
SimpleQA 54.0% 48.6% 19.3% ...
... ... ... ... ...

Why does it matter?

Customer Service
Autonomous Driving

Why now?

2012

AlexNet

ChatGPT

Llama

Qwen

DeepSeek

Let's pretrain!

Let's fine-tune!

Let's just serve!

Let's buy tokens!

What did we learn?

Buy Tokens!

More Capable

More General

A Mental Shift

A Mental Shift

Model

->

Service

Why now?

Speed Metric for AI

Speed Metric for AI

Tokens

Tasks

per

Second

More TPS == Faster ?

Test-Time Scaling

Test-Time Scaling

Smaller Model
More Tokens

==

Larger Model
Fewer Tokens

Implications

Implications

   Benchmarks != intelligence

Tokens != Tasks

Intelligence
Intelligence
Intelligence

A New Metric

Intelligence
Goodput
Intelligence
Goodput
=
Intelligence Time

Disambiguation

Disambiguation

Time: Wall Time

Intelligence:
Delegate to benchmarks

Intelligence Goodput:

$$ G = \frac{\sum\limits_{i=1}^{n} w_i s_i}{\sum\limits_{i=1}^{n} w_i \cdot t} $$

Models Intelligence Time Intelligence
Goodput
Grok 4 Fast 60 2.7d 254.88
GPT-5 Medium 66 3.8d 202.85
Gemini 2.5 Flash 54 3.1d 199.27
GPT-5 High 68 7.9d 100.00
Gemini 2.5 Pro 60 7.5d 92.67
Claude 4.5 Sonnet 63 7.9d 91.75
Grok 4 65 40.2d 18.71

Reduce Verbosity

Limitations

1. Complex engineering setup

2. Expensive to run

3. Ignored tokens

Another Problem with TPS:

Multi-Modal

A normal day in

1990



MS - DOS Version 6.22
(C) Copyright Microsoft Corp 1981 - 1990.

Human-Computer Interaction

Human-Computer Interaction

2020

2025

Human-Computer Interaction

Text
Audio
Image
Video
???

Measure the speed of AI

Qualitative Quantitative Single-Modal Multi-Modal Intelligence Goodput Intelligence Goodput + Multi-Modal Benchmarks Tokens per Second ???
Intelligence
Bandwidth

KiloBytes per Second
(KB/s)

Intelligence

Good Metric

Number of Transistors

Compute Performance (FLOPS)

Network Bandwidth

Intelligence Bandwidth (KB/s)

->

Growth Pattern

Moore's Law

Huang's Law

Nielsen's Law

[???]'s Law

Intelligence

Jin's Law

Jin's Law

The peak AI output rate (KB/s)
doubles every year.

Human AI Interaction

Human AI Interaction

Self-Paced

Text
Image

Fixed Speed

Audio
Video

Predictions

Predictions

Images in text responses
in 1 year

Predictions

Real-time video interactions
in 3 years

2025: 8s generated in ~60s   
2028: 8s generated in 8s = $\frac{64}{2^3}$

Limitations

Limitations

1. The metrics are too simple

2. The doubling period

3. The growth plateau

Risks to exponential growth

1. AI bubbles

2. Energy supply

Takeaway

Takeaway

Processing speed is
an essential component of
intelligence.

Training Recipe


Model


Data

} AI {

Framework


Kernels & Compiler


Hardware

Thank you!

haifeng.jin@pm.me

haifengjin.com

1