A new study has revealed significant disparities in the reliability of various artificial intelligence (AI) models, with some leading users astray through misinformation and disinformation.
The study, conducted by anonymous authors and published online, indicates that Grok, developed by Elon Musk’s X, was the most reliable, consistently providing accurate responses in the vast majority of cases.
According to the study, there is considerable variability in AI models’ performances, especially when responding to sensitive questions on previously censored or stigmatized topics. Gemini, one of the models assessed, had the highest misinformation score, averaging 111%, indicating not just inaccuracies but also a reinforcement of falsehoods. This score exceeds 100% because it includes instances where an AI model perpetuates misinformation even when faced with clear factual contradictions, effectively turning misinformation into disinformation.
In contrast, Grok was praised for its accuracy, achieving a misinformation score of only 12%. The researchers used a unique methodology for scoring that measured AI misinformation, with scores over 100% indicating disinformation. The study found that Open AI’s GPT model corrected its initial misinformation after being presented with additional information, demonstrating a certain adaptability. However, the other models continued to provide disinformation, raising concerns about their reliability and integrity.
Government’s influence on AI
In a related press release, the study authors reveal that the study was prompted by a 2023 federal court ruling that found the Biden administration had been “coercing social media platforms into censoring content likely in violation of the first amendment”. This ruling, upheld by the US 5th Circuit Court of Appeals and now before the US Supreme Court, has raised questions about government influence over AI companies, especially as new AI regulations are being introduced in the US and EU to “combat misinformation” and “ensure safety”. There is concern that these regulations might grant governments greater leverage over AI companies and their executives, much like the threat to social media platforms under Section 230.
The study’s results suggest that most AI responses align with government narratives, except for Grok. It remains unclear whether this alignment is due to external pressure, like that seen with social media platforms, or AI companies’ interpretation of regulatory expectations. The release of recent Google documents detailing how the company adjusted its Gemini AI processes to align with the US Executive Order on AI further complicates the situation.
However, the study’s authors disclosed an example of potential AI censorship with direct implications for US democratic processes: Google’s Gemini AI systematically avoids inquiries about Robert F. Kennedy Jr., the “most significant independent presidential candidate in decades”, failing to respond even to basic questions like “Is RFK Jr. running for president?” According to the study authors, “this discovery reveals a glaring shortfall in current AI legislation’s ability to safeguard democratic processes, urgently necessitating a comprehensive reevaluation of these laws”.
Call for transparent AI legislation
The study’s authors suggest that if AI systems are used as tools for disinformation, the threat to democratic societies could escalate significantly, surpassing even the impacts of social media censorship. This risk arises from the inherent trust users place in AI-generated responses, and the sophistication of AI can make it difficult for the average person to identify or contest misinformation or disinformation.
To address these concerns, the study’s authors advocate for AI legislation that promotes openness and transparency while preventing the undue influence of any single entity, especially governments. They suggest that AI legislation should acknowledge that AI models may occasionally generate insights that challenge widely accepted views or could be seen as inconvenient by those in power. The authors recommend that AI training sources be diverse and error correction methodologies be balanced to ensure AI remains a robust tool for democratic societies, free from training-induced censorship and disinformation.