A brown plate and a 0.5-second inference engine

I order nasi kandar at my usual mamak stall.
One piece of fried chicken.
A boiled egg.
This part never changes.
That day, for no clear reason, I added okra.
One strip of green felt like it brought a little order to the plate.
When the plate was finished, the cashier, the anne, glanced at it for a brief moment.
Then he named the price.
Twelve ringgit.
I had added okra.
Something had increased.
At least, that was how it felt on my side.
But his number did not move.
I began eating and, halfway through, a thought occurred.
In situations like this, might an AI be more accurate?
Detect the ingredients.
Estimate quantities.
Calculate a total.
Fairer.
More transparent.
Possibly faster.
Yet the man in front of me had answered in half a second.
The plate was brown.
All boundaries were gone.
And still, he did not hesitate.
Nasi kandar as input data
From an information engineering perspective, a nasi kandar plate is a worst-case input.
White rice.
Fried chicken.
A boiled egg.
Okra.
Sometimes squid.
Then several ladles of curry are poured over everything.
Red, yellow, black.
They mix, and the final state is simply brown.
Edges disappear.
Shapes collapse.
Ingredients are fully masked.
If I pointed Google Lens at it, the result would likely be just one word: curry.
And yet the cashier looks once and states a price.
Why is this possible?
And could a modern image recognition model beat him?

Latency
Start with speed.
An AI would take a photo, upload it, run inference, return a result.
Even optimized, there would be hundreds of milliseconds, often seconds, of delay.
The cashier works differently.
The moment the plate appears, he says “twelve ringgit.”
No pause.
No visible thinking.
No loading indicator.
The verdict is clear.
The cashier wins.
His processing is fully standalone.
Offline.
Zero latency.
This is edge computing in its purest form.
Occlusion
Next comes the domain where AI struggles most.
Ingredients buried under curry.
An egg hidden behind okra.
Squid submerged in sauce.
Vision alone is not enough.
If it cannot be seen, it may as well not exist.
AI reasons from pixels.
The cashier does not.
He looks at volume.
At how the rice rises.
At weight and density.
From tens of thousands of plates stored in memory, he infers: this bulge is an egg.
If uncertain, he can call an existing API.
“What’s this?”
Voice input.
Against a human, that option exists.
Call it a draw, or a narrow win for the cashier.
Humans do not rely on vision alone.

Energy efficiency
Then there is cost.
On the AI side: GPUs, servers, cooling systems, electricity.
Large amounts of energy consumed out of sight.
On the human side:
One cup of teh tarik in the morning.
A piece of roti canai.
That is enough to sustain high-precision inference well past noon.
The verdict is decisive.
The cashier is an extremely energy-efficient computing unit.

Explainability
AI can explain itself.
“Chicken: 5 RM.
Okra: 1 RM.
Egg: 1 RM.
Total: 7 RM.”
Clear.
Transparent.
Consistent.
The cashier does not explain.
Or cannot.
His logic is a black box.
Sometimes it is cheaper.
Sometimes more expensive.
Regular-customer adjustment.
Mood coefficient.
Dynamic pricing for tourists.
None of this is logged.
Yet this fluctuation is part of the dish itself.
Discretion cannot be implemented
In raw recognition accuracy, AI will eventually catch up.
Occlusion and sauce will be solved.
Still, it will not win.
The reason is simple.
The cashier has discretion.
The same plate can cost different amounts, depending on who is standing in front of him.
This is inaccurate.
Inefficient.
Illogical.
And yet that ambiguity binds the shop and its customers.
The accounting of nasi kandar is not calculation.
It is an algorithm of human relationships.
For now, this remains human territory.







