Race Is a Social Construct, and AI Does Not Know That

Race, Power and Algorithms: Critical Race Theory and AI

One of the most consequential mistakes in AI development is treating race as a stable, natural category, as if racial groups were discovered rather than constructed.

Race has no consistent biological basis. The genetic variation within racially defined groups is greater than the variation between them. The racial categories used in the United States are different from those used in Brazil, which are different from those used in South Africa, which are different again from those used in India. These differences are not because some countries have better science, they reflect the fact that racial categories are social and legal constructs, shaped by history, power, and politics.

This matters enormously for AI. When AI systems are trained on data that includes racial categories, as most large-scale systems are, they are being trained on socially constructed categories that vary across time and place, reflect historical power relations, and carry within them the accumulated effects of discrimination.

Consider a predictive policing algorithm trained on historical arrest data. That data reflects decades of racially discriminatory policing, communities of colour over-policed, white-collar crime in predominantly white communities under-policed. The algorithm learns from this data and produces predictions that reflect it. It then directs more policing resources toward communities of colour, generating more arrests, which feed back into the training data and reinforce the original pattern. The algorithm did not create the discrimination. But it amplified and laundered it, converting historical human bias into an apparently objective mathematical output.

This laundering function is one of the most dangerous things AI systems can do. They take discriminatory social patterns, encode them mathematically, and present the results as objective. The subjectivity disappears. The discrimination remains.

A second problem arises when AI systems attempt to remove race from their models as a way of avoiding bias. This sounds sensible but often does not work. Race correlates with so many other variables, neighbourhood, school, income, postal code, name, that removing the explicit racial category while retaining correlated variables produces what researchers call proxy discrimination. The algorithm does not see race. But it sees everything that race predicts in a racially unequal society. The discriminatory outcomes persist.

Neither encoding race nor excluding it resolves the problem. The problem is structural, it lies in the social inequalities that produce the data in the first place. Addressing it requires structural thinking, not just technical adjustment.

Reflection question: Can you think of a variable that seems race-neutral but might serve as a proxy for race in your organisational context? What would it mean to take that seriously?