From bodybuilders looking to bulk up to everyday folks trying to drop a few pounds, people have practiced tracking their food for decades. Tracking can be a powerful self-accountability tool for adhering to protocols or personalizing and refining a plan with a nutritionist or health coach. But if the tracking software isn’t reliable, the data could end up being inaccurate and ultimately interfere with people reaching their goals.
A study published in Current Developments in Nutrition showed that this is precisely the case if people use artificial intelligence (AI) to assess their dietary intake. Researchers came to the startling conclusion that three popular large language models (LLMs) – ChatGPT, Claude, and Gemini – generated significant errors and are not yet up to the task of accurately assessing dietary factors through photos. The authors noted that “systematic underestimation of large portions and high variability in macronutrient estimation indicate these general-purpose LLMs are not yet suitable for precise dietary assessment in clinical or athletic populations where accurate quantification is critical.” (Emphasis added.)
The study used 52 standardized food photographs including complete meals as well as individual food components, in three portion sizes (small, medium, large). “Each model received identical prompts to identify food components and estimate nutritional content using visible cutlery and plates as size references.” The estimates were compared against reference values obtained through direct weighing and use of nutritional databases.
All three models exhibited significant underestimation of calories that increased with portion size. The most accurate model had an average error rate of 36%. This means that someone using these tools to help them follow a 15% caloric deficit for the purpose of losing body fat could actually end up in a 21% surplus. This degree of inaccuracy is similar to that of self-reported methods, which are reported to differ as much as 20-50% when compared to validation using doubly-labeled water.
Among the limitations and complicating factors affecting the accuracy of the AI estimations were use of standard cutlery and plates to assess portion sizes, and failure to integrate all aspects of a mixed meal when each of the individual components was not easily visible. For example, in their current iterations, the LLMs would have difficulty accurately estimating the numbers for a chili consisting of meat, onions, tomatoes, beans, and peppers, compared to a chicken leg with a simple side of broccoli.
The models made large errors in caloric density and macronutrient content and also failed to accurately identify all ingredients. People who take the time to record their food intake and assess the data in pursuit of specific goals tend to be meticulous about it and want accurate numbers. It’s not conducive to achieving those goals if the tracking method they employ isn’t much better than “eyeballing” things. If someone thinks they’ve consumed 4 ounces of salmon but it was actually 6 ounces, that’s not an insignificant difference. And if there’s a plant-based burger on a plate but AI mistakes it for beef, that would obviously throw off the numbers.
People tend to underestimate what they consume, so it’s tempting to use AI to make sure the eyes don’t deceive. But if AI under- or overestimates, too, the data may not be as reliable as it’s assumed or expected to be. For the time being, for those seeking the best precision, keeping it old-school might be best: a good old-fashioned food scale and measuring spoons! AI will no doubt improve over time and become a more reliable tool for dietary tracking in the future, though. And when it’s up to the task, it’ll be convenient to use in situations where it’s not practical to whip out a measuring cup, such as at restaurants or during special occasions. People so often photograph their food for sharing on social media. Eventually, it won’t be just to get engagement; it could potentially become a genuinely useful tool for dietary fine-tuning.