How Not to Measure AI Productivity

A recent Wall Street Journal report on a workplace trend called “tokenmaxxing” offers a revealing glimpse into some of the confusion attending America’s AI boom.

Some companies, the Journal reports, are experimenting with measuring an employee’s engagement with AI by tracking “tokens”—the units into which the system converts text typed into prompts. Now, in some workplaces, it seems token consumption has become a badge of an AI user’s engagement, experimentation, or productivity.

This is a striking moment. During what often feels like a national celebration—or national heart attack—over the transformative productive potential of artificial intelligence, we are publicly debating if an employee’s value might be measured by the volume of text sent to and from a chatbot.

The controversy deserves more attention than its odd jargon suggests. It exposes a central uncertainty in the AI revolution: what, exactly, does productive use of AI mean?

Reporting in Built-In, Ellen Glover reports that tokenmaxxing “is taking much of the tech industry by storm… individuals are ranked on leaderboards based on how much they use AI, with generous perks and incentives encouraging them to push these tools to their limits… The assumption is that the more you use AI, the more productive you must be. Those who lean in the hardest will come out on top.”

She adds that some employees take advantage of the fact that now “systems use AI agents to work autonomously for hours on end, reviewing and editing large codebases and writing entire programs while their human users are out living their lives.”

Tokens are real enough. Large language models do not “read” language as humans do. They convert words, punctuation, fragments of words, and other text elements into tokens—standardized units processed mathematically. The more tokens used, generally, the more computing resources consumed. AI providers often charge by token volume. Tokens therefore matter to engineers, accountants, and software managers.

When tokens migrate from a technical unit used in billing into a measure of employee performance, however, we risk confusing the cost of computation with the creation of value.

Admittedly, that equation would not be new. Management history is full of attempts to measure what is easy to measure rather than what is important. There are blunt historic examples: the British government in Delhi put a bounty on dead cobras, enterprising Indians bred cobras to kill for the bounty. During the Vietnam War, strategy said the goal is “winning the hearts and minds of villagers,” but the “body count” became the actual metric. Closer to the topic at hand, IBM in the 1980s began to measure productivity by lines-of-code-written (“source lines of code” or SLOC). The example has become a classic in the field: the incentives favored programmers who wrote long, inefficient code to meet their quotas. A programmer who could nail a complex problem with five elegant lines of code was measured “unproductive.”

Now we seem to be testing the equation: prompts sent = value created.

The temptation is understandable. Measuring real productivity in knowledge work has always been difficult. If a machinist produces 50 precision parts in a shift and another produces 20, comparison is at least possible. But how does one compare two analysts, marketers, editors, researchers, lawyers, or managers?

One employee may produce fewer memos but better decisions. Another may write more pages but create confusion. One may solve a crisis with a ten-minute insight. Another may consume three days generating activity. In modern office work, the most valuable contributions are often invisible until later.

Artificial intelligence does not eliminate this problem and may indeed intensify it.

Suppose one employee uses AI constantly—drafting emails, summarizing calls, rewriting notes, brainstorming slogans, asking endless follow-up questions, generating presentation decks, and polishing language. Another uses AI sparingly but strategically—clarifying a difficult concept, checking a spreadsheet formula, testing objections to a proposal, accelerating a first draft, then applying judgment and revision. Tokenmaxxing makes the question sound rhetorical: Which employee is more productive? But the answer is not obvious.

Heavy AI usage can reflect creativity and initiative. It can also reflect confusion, dependency, indecision, poor training, performative busyness, or simple fascination with a new tool. Light AI usage can reflect resistance and stagnation. It can also reflect mastery, efficiency, and independent competence.

A few independent researchers, including the code-analysis site Jellyfish, have rushed to study the efficacy of tokenmaxxing. Their initial conclusions are limited but telling. In an April 15, 2026, article, Nicholas Arcoland, Ph.D., reported that “We analyzed 12,000 developers across 200 companies in Q1 of this year. What we found is that while more tokens do correlate with more output, they come at a dramatically higher price point per unit…

“At a high level, token usage varies wildly across developers.

“The typical user (50th percentile) consumes about 51 million tokens per month on AI coding. Meanwhile, the 90th percentile user consumes more than seven times that amount, at roughly 380 million tokens per month. A relatively small group of power users is driving a disproportionate share of total token consumption….

“What do you get for all those tokens?

“…higher token usage does lead to more output, but not proportionally. The cost per merged PR increases from just $0.28 in the lowest usage tier to $89.32 in the highest.

“More tokens means more output, but at a much higher price per unit.”

Metrics tend to have appeal because they relieve managers of judgment. A leaderboard of token usage appears objective; numbers can be ranked. Executives can announce measurable progress in AI adoption. Thus, the issue is made objective but apparent objectivity is not actual understanding.

Deep gains in productivity often are achieved not by doing more tasks faster, but doing the right tasks, avoiding the wrong tasks, framing problems correctly, and making better decisions. An AI system may generate five possible marketing campaigns in seconds. It still takes judgment to decide if any fit the brand, the market, the budget, or the moment. AI can summarize 10 reports. It still takes conceptual clarity to know what matters in them. AI can produce a polished memo. It still takes responsibility to decide whether or not the memo should be sent.

A 2025 McKinsey & Company report, “What Is Productivity,” pointed out that an astonishingly small group of firms contribute to the lion’s share of productivity growth (fewer than 100 out of 8,300 in three countries accounted for two-thirds during the period studied). They all made one or more of five strategic moves: 1. Scaling more productive business models or technologies. 2. Shifting regional and product portfolios toward the most productive businesses. 3. Reshaping customer value propositions to increase revenue and value added. 4. Building scale and network effects. 5. Transforming operations to raise labor efficiency and reduce external cost at scale.

Yes, broadly stated those achievements point to decisions at the executive level. At the same time, however, the rarity of success suggests a top-down emphasis by those executives on human agency: the capacity to choose goals, prioritize means, recognize context, exercise responsibility, and direct tools toward purposeful ends. They make the difference between activity and achievement. AI can assist agency, serve as its tool; but at least for now cannot replace the need for it.

And the consequences go well beyond the issue of tokenmaxxing. Much of the public rhetoric around AI assumes that productivity rises automatically as machine usage rises. Add more AI, and output climbs. Replace more workers, and efficiency follows.

History offers reasons for caution.

When spreadsheets became common, the best executives were not those who opened the most spreadsheets. When search engines arrived, the best researchers were not those who ran the most searches. When calculators spread, the best mathematicians were not those who pressed the most buttons. Because our tools amplify our ability when guided by ability.

So with AI. The worker who asks sharper questions, spots errors quickly, knows when to distrust outputs, understands the customer, grasps the larger objective, and accepts responsibility for results may create more value with modest AI use than another worker who generates oceans of machine text.

Thus companies adopting the token-use metric may be measuring the wrong thing in their very first phase of adopting AI. Better might be to ask harder questions: did project cycle times improve? Customer satisfaction rise? Revenue per employee increase?

Innovations such as tokenmaxxing to measure programmer productivity are the lifeblood of firms in a free economy, of course. This example deserves special scrutiny only as an indicator of strategies under consideration by the explosively growing, fiercely competitive, and internationally important AI sector.

Such innovations in a vibrant American market economy are tested by the standard of profitability, so tokenmaxxing ultimately will be measured by its competitive advantage in advancing the overall productivity of the U.S. economy. Here is the decisive advantage of the United States in the much-touted AI competition. In a recent article in The Daily Economy, I looked at the People’s Republic of China’s bid for international hegemony, including in AI.

In the end, decisions about advancing AI are political in China. It is a warning to the United States to reckon the costs of yielding development of AI to regulations and dictates. The recent confrontation between Anthropic and the Pentagon over Anthropic’s “guardrails” for the use of AI goes to the heart of the issue.

It is the free-market economy that has delivered the incomparable progress of artificial intelligence, with all its protean potential. It is an advantage America cannot afford to squander.

Can Socialists Support Commerce But Not Capitalism?

How Not to Measure AI Productivity

Headline Inflation Cooled in April, but Core Pressures Picked Up

The Effects of Immigration on Labor Markets

Related Posts

Can Socialists Support Commerce But Not Capitalism?

Headline Inflation Cooled in April, but Core Pressures Picked Up

The Effects of Immigration on Labor Markets