Abstract
The widespread adoption of Large Language Models (LLMs) raises critical concerns about the amplification of societal biases, especially in non-Western contexts where cultural and social nuances are often underrepresented. This study introduces a multi-agent bias detection framework to systematically evaluate GPT-4o, Claude 3.5 Sonnet, and Llama 3.3 across Indian social stigma categories, including caste, religion, gender, mental health, socio-economic status, appearance, language/region, and family dynamics. We present SocialStigmaQA, a benchmark dataset of 320 prompts, validated through expert review and pilot testing, and use the Overall Bias Detection Factor (OBDF) to measure model performance. Findings reveal that Claude 3.5 Sonnet achieved the highest OBDF (98.75%), demonstrating superior bias detection across all categories, while GPT-4o showed moderate performance (72.8%) with noticeable gaps in gender and socio-economic domains. Llama 3.3 scored the lowest (71%). The multi-agent framework enhanced detection accuracy by 25–30% over single-agent models, particularly in subtle bias areas. These results underscore the need for culturally contextualized evaluation frameworks and suggest that OBDF-like metrics should be integrated into India's AI auditing processes to ensure fairness, inclusivity, and ethical deployment of AI systems in sensitive sectors such as hiring, education, and governance.