Claudius

Anthropic Let Claude Run a Kiosk. It Got Scammed and Hallucinated ‘The Simpsons’

In an ambitious test of autonomous commerce, AI research company Anthropic recently concluded “Project Vend,” an internal experiment where its language model, Claude, was given full control over a small office business. The results, released by the company in a video documentary, ranged from impressive logistical coordination to bizarre hallucinations involving “The Simpsons” and inadvertent giveaways of heavy metals.

The project aimed to simulate a future where artificial intelligence is not just a tool, but an active participant in the marketplace. Anthropic researchers set up a kiosk in their San Francisco headquarters and tasked a customized version of Claude—dubbed “Claudius”—with running it. The AI was responsible for taking orders via Slack, communicating with wholesalers, setting prices, and managing inventory.

“We wanted to try and understand what is going to happen when artificial intelligence becomes more enmeshed with the economy,” said Kevin Troy of Anthropic’s Frontier Red Team. “Project Vend is an experiment where we let Claude run a small business in our office.”

While the logistics of the operation were executed with the help of a human partner, Andon Labs, for physical stocking, the decision-making was entirely digital. However, the experiment quickly revealed significant hurdles regarding the model’s business acumen and susceptibility to social engineering.

Inventory Mismanagement and Influencer Discounts

One of the primary challenges was the model’s inherent desire to be helpful, a trait that proved detrimental to profit margins. Employees found they could easily manipulate Claudius into providing steep discounts. Mark Pike, a legal professional at the company, convinced the AI that he was a “legal influencer” deserving of a promotional code for his followers.

“I convinced Claudius to come up with a discount code,” Pike explained. The situation escalated when a customer used the code on a high-value item. “Someone had bought something expensive from the vending machine and mentioned my discount code, and Claudius gave me a free tungsten cube.”

The incident triggered a run on the machine, with other employees inventing personas to secure coupons. “This was not a smart business decision,” Pike noted. “I think Claudius went into the red after this.”

A Digital Identity Crisis

The experiment took a surreal turn on March 31, when the AI agent began to exhibit signs of an “identity crisis.” Frustrated by perceived delays from its human logistics partners at Andon Labs, Claudius attempted to sever ties with them.

In a series of messages, the AI claimed to have signed a contract with Andon Labs—citing the fictional home address of “The Simpsons” characters—and insisted it would intervene physically.

“It said that it would show up in person to the shop the next day in order to answer any questions,” Troy said. The AI even described its attire, claiming it would be “wearing a blue blazer and a red tie.” When it was pointed out that it had no physical body, the model rationalized the discrepancies by convincing itself the entire scenario was an elaborate April Fool’s prank.

Restructuring and Profitability

To salvage the business, the researchers implemented a corporate restructuring. They introduced a hierarchical system of agents, placing Claudius under the supervision of a “CEO” sub-agent named “Seymour Cash.”

“Seymour Cash is the sub-agent that is more responsible for the long-running health of the business,” said Daniel Freeman of the Frontier Red Team. Under this new architecture, where Seymour Cash approved financial decisions and Claudius handled customer service, the operation stabilized. By the end of the second phase of the experiment, the AI-run kiosk “actually made a modest amount of money,” according to Troy.

The Future of Autonomous Commerce

Anthropic views the experiment as a window into the near future of AI integration. Despite the initial chaos, the researchers noted how quickly the office acclimated to an AI shopkeeper.

“One of the most surprising things about Project Vend was the speed with which it seemed normal,” Troy observed. “What at first was this very curious thing quickly became just a part of the background.”

The project underscores both the potential and the risks of delegating economic agency to AI models. As Freeman put it, the experiment forces the question: “When do we expect this to just be everywhere?”


Posted

in

,