Google is in the hot seat right now, as questions are being raised about its AI chatbot, Gemini’s reliability.
Some of the contractors tasked with rating the accuracy of Gemini’s responses are now being forced to evaluate prompts outside their realms of expertise, TechCrunch reports. Previously, the contractors could skip such prompts; now, they’re told to just “rate the parts of the prompt you understand.” For instance, a contractor with no scientific background might be asked about specific diseases—calling into question just how correct these Gemini-generated answers can be.
Reportedly, contractors working on Gemini have received guidelines from Google has led to concerns that the AI chatbot could be more prone to spouting out inaccurate information on highly sensitive topics, like healthcare, to regular people.
READ: Will Meta and Google benefit from banning TikTok? (December 13, 2024)
To improve Gemini, contractors working with GlobalLogic, an outsourcing firm owned by Hitachi, are routinely asked to evaluate AI-generated responses according to factors like “truthfulness.” But it seems that these contractors are now being forced by Google to answer prompts they have no expertise in rather than skip them, raising questions about the accuracy of the results generated by Gemini.
Internal correspondence seen by TechCrunch shows that previously, the guidelines read: “If you do not have critical expertise (e.g. coding, math) to rate this prompt, please skip this task.”
But now the guidelines read: “You should not skip prompts that require specialized domain knowledge.” Instead, contractors are being told to “rate the parts of the prompt you understand” and include a note that they don’t have domain knowledge.
This could mean inaccuracy in everything Gemini generates, which in turn could set back the search giant’s progress in the AI space.
READ: Google sues Indian engineer for leaking Pixel trade secrets (November 26, 2024)
However, Google has taken a stand: “Raters perform a wide range of tasks across many different Google products and platforms,” said Google spokesperson Shira McNamara. “They do not solely review answers for content, they also provide valuable feedback on style, format, and other factors. The ratings they provide do not directly impact our algorithms, but when taken in aggregate, are a helpful data point to help us measure how well our systems are working.”
Only time will tell if Google continues to deny and manage the situation, or if it will restore the old guidelines and work to improve Gemini’s accuracy.

