Why is pattern recognition not racism?

image

If a human tries to generalize of a bad action based on a race, it's called racism.

But, if a machine perform the same, like inference of bad action based on race input. It's not called racism. Why?

Specifically, the machine am I talking about is machine learning. Where it learns based on input and output pair, returning a model.

Moreover, if it uses deep learning, the AI designer itself couldn't understand how the model infer. It stated in Wikipedia about Explainable AI

XAI counters the "black box" tendency of machine learning, where even the AI's designers cannot explain why it arrived at a specific decision.

So, why?

In a narrow sense*, racism is an intentional act, belief or thought of a conscious mind (possibly stemming from subconscious racial biases that one may not be aware of).

Computers aren't considered to be conscious. So inferences that AI draws from data can't be said to be racist in the same sense.

But AI can have racial biases, and biases towards other things (whether that's due to bad data or due to some statistically-significant correlation). This is something a lot of AI developers are aware of and try to counter (by e.g. not including race as one of the features of the model).

We don't want AI to have e.g. racial biases, because we don't want people to be treated differently based on things they can't control (like race). This is especially the case if the data is bad or the differences between races are small, or if there are some confounding variables.

See also: AI ethics.

(This is possibly a better question for the AI Stack Exchange site.)

* In a broader sense, e.g. "systemic racism" refers to discrimination due to policies and practices of a society or organisation, that may not necessarily (but often does) reflect the intentional directly-racially-motivated act of any given person in the system. But one could argue that such a system still has many intentional acts stemming from these policies, and the policies themselves may have originated from more direct racism.

Ultimately, whether racism requires consciousness is a semantic distinction. There is some benefit to limiting the definition to intentional acts, to not water down the meaning. There may also be some benefit in associating intentional acts with non-conscious policies and machines, to emphasise that similar harm may be caused in both cases.

If a human tries to generalize of a bad action based on a race, it's called racism.

First of all, there's no single and universal understanding of how "racism" should be defined, but by social scientific standards, that's not it. What you describe here is negative racial stereotyping.

Moreover, there's an important difference between individual and structural racism. If a person harbors and acts upon conscious racial stereotypes, we generally can agree that this individual person is racist. But structural racism (segregated and inferior housing, mass incarceration, etc.) can theoretically exist in a world where racial stereotypes are absent or at least have no causal power.

But, if a machine perform the same, like inference of bad action based on race input. It's not called racism. Why?

With this understanding, it should be unsurprising that this premise is objectively false. People (for example, those involved in the Algorithmic Justice League) have demonstrated that structural racism is evident in machine learning applications.

ML/AI models aren't generally "racist" in the "She hates him just because of his skin color" sense. They just want to do the very best job of predicting an unknown variable from a set of known variables. However, because of the tendency of historical and current racism in society to produce widespread systemic effects in various and sundry ways, the concept of "race" can tend to appear in places unexpected.

It is well-studied that race and ethnicity are actually poor predictors of pretty much everything except race/ethnicity, once other relevant variables are accounted for. Such variables include, for example, "socioeconomic status" - race can be used as a predictor for things related to socioeconomic status, because socioeconomic status (in the U.S.) is correlated with race. However, if one actually has access to socioeconomic status, then it is a much better variable to have in basically every context, with the narrow exception of when one is specifically studying race/ethnicity.

As an example, suppose we would like to predict the grade with which a cohort of prospective students will complete a course of study. All we know about these students is their race. It turns out that we will be able to do a kind of okay job predicting grades using our race variable. However, this is not because race has a causative effect on academic success - it is because socioeconomic status (and some other things) have a loose correlation with race, and these variables are strong causative agents for academic success.

Now, this sword cuts both ways:

vs

Number 2 is actually the more nefarious of the two, because it's "hidden", in the sense that we didn't even include a "race" variable in our model. To illustrate using our students example: We gather only "socioeconomic status" as a variable to predict student success. Our model ends up predicting high success rates for wealthy students and low success rates for poor students. This likely results in predicting high success rates for white students vs, say, latino students, because of the socioeconomic disparity. Thus, despite the fact that race was not considered as a variable in the model, our model makes determinations along racial lines because of existing systemic disparities.

This is a good question which, IMHO, reduces down almost entirely to whether or not the noted systemic disparities are a moral dillema. In other words, your moral judgements over this problem are probably identical to whatever your moral judgements are about how systemic socioeconomic disparities break down along racial lines.

This depends on your answer to the moral dillema: when predicting student success rates, are we also obliged to correct systemic socioeconomic/etc biases? This is a harder problem than the above, because it potentially introduces a duty to act to fix one moral problem into various and sundry unrelated problems. To wit, it is very non-obvious that predicting which of several prospective students would succeed in a course of study involves "should we take into account the effects of historical racial injustice?" - these seem, on their face, to be totally unrelated problems.

If your answer is "no, let's not tackle historical racial injustice, let's just predict success rates." Then your predictions along socioeconomic lines will predict low success rates for poor students, which may then perpetuate said historical injustices.

Suppose, OTOH, that one DOES wish to tackle historical racial injustice. Since a "good" model that takes into account socioeconomic status will tend to weight success more heavily for some races, really the only way to fix this is to adjust the post-model results to normalize based on race. This is called "affirmative action" and is itself a contentious moral topic (in the U.S. at least).

You ask:

Why is pattern recognition not racism?

Pattern recognition is simply searching for patterns among data, and is a class of algorithmic and heuristic computation. Most generally, it is possible to program a computer to judge whether or not a pattern exists. For instance, give a sequence of integers, it is possible to decide whether or not the sequence in monotonic. Or in logic, it is possible to decide if given a set of truths, whether or not that truth can be deductively valid with a superset. Thus, programing computers to recognize patterns is an epistemological judgment (SEP). Of course, pattern recognition tends to use complex statistical methods consistent with strategies commonly used in data science and machine learning.

Racism on the other hand is a type of bias in which one's prejudices are based on the somewhat malleable notion of race or ethnicity. From WP:

Racism is discrimination and prejudice against people based on their race or ethnicity. Racism can be present in social actions, practices, or political systems (e.g. apartheid) that support the expression of prejudice or aversion in discriminatory practices.

What it means to have a race or ethnicity is a little complicated because race tends to be a socially constructed category as opposed to a biologically scientific one. For instance, to be "black" at one point in the US determined by the one-drop rule. Today, being "black" is often associated more with being a part of African-American culture. Race is a philosophically complicated and contentious topic, as given in Race (SEP):

This historical concept of race has faced substantial scientific and philosophical challenge, with some important thinkers denying both the logical coherence of the concept and the very existence of races. Others defend the concept of race, albeit with substantial changes to the foundations of racial identity, which they depict as either socially constructed or, if biologically grounded, neither discrete nor essentialist, as the historical concept would have it... Both in the past and today, determining the boundaries of discrete races has proven to be most vexing and has led to great variations in the number of human races believed to be in existence.

So, pattern recognition is not racism, however, pattern recognition can exhibit racism. This is a very important distinction. Pattern recognition of numbers in number theory will never be racist because it judges based on categories (even, odd, monotonic, etc.) that have nothing to do with race. But if pattern recognition is applied to categories that do involve race and ethnicity (such as selecting if a person is of "race X" or "race Y" and attributing characteristics to such categories), then it has the POTENTIAL of exhibiting racist judgement. That turn of words cannot be stressed enough. Since race is itself a complicated claim (races are roughly thought to be natural kinds), the notion of racist algorithms turns on the concept of race.

As to be expected, the intersectionality between complex data processing and statistical methods (which are mathematical models) and race and racism (which are categories and claims about the world) is itself fraught with conceptual peril. ML algorithms and data sets always have to be scrutinized for several sorts of bias. And some of those pertain to race under the right conditions. Here's an example of research that shows [text produced by language models likely has "racist" language based on dialect (nature.com)]12.

The question you raise is very complicated because race, statistical bias, and mathematical modeling are three distinct and conceptually complicated disciplines. But it's an important topic because of its ethical dimensions. For a good read on the broad ethical implications of the use pattern recognition by corporations, check out Atlas of AI by Crawford. It's not devoted to racism, but it shows you how biased judgment in automated has a major impact of the real world.

An AI trained with data produced by a given society will reflect the biases and values of that society. If that society is racist, the AI will be factually racist. This has been an ongoing concern for years. This does not mean it is malevolent; after all, it has no volition whatsoever. It is simply that it reflects and exposes the biases in the data it has been trained with. (As an aside, this mechanism is not restricted to artificial intelligences, because this is a kind of racism frequently found in humans in a modern society: Few people aside from Nazis are openly and consciously malevolent; their beliefs and behavior simply reflect their upbringing in a given environment, personal and societal.)

In this sense (discrimination by "race"), of course AIs can be racist, reflecting the racist society that provided the data they were trained with.

The other answers explain well enough why machine learning racially biased results are not themselves a sign of racism, while application of that data would be ethically problematic.

For humans, the issue is that this is a sensitive and political issue that contemporary humans in civilization cannot in any way pretend not to exist. Also the effort to avoid all kinds of statistical biases and invalid inferences requires lots of training that most people just never got.

That's why by default, an average person trying in any way to correlate anything with peoples generic inheritance should not even attempt this, and if they do their motivation and self-restraint must be questioned ethically.

However science does consider genetic difference such as evaluating statistical data of skin cancer of people with white skin in equatorial regions and vitamin d deficiency in people of darker skin in regions near the poles. As such, it is possible for human (distinguished scientists) to seek patterns in data related to human genetic makeup without acting immorally.

If a human tries to generalize of a bad action based on a race, it's called racism.

Not quite. Racism is the believe that you can divide the human species into distinct subgroups with meaningful, permanent and distinct properties. That would be the "scientific racism". That being said, science has moved back from that claim considerably.

Like sure you can and historically people did group other people by skin color. Is that distinct? Not really. There are tons of shades of brown blending into each other. Is it permanent? Not really. People's skin color varies with exposure to the sun or the lack thereof, both in the really short term (sun bathing), the intermediary short term (over a lifetime) and with respect to the slightly longer short term of evolution and gene inheritance. Is it meaningful? Not really. It largely tells you what you already see and with regards to inferring, ancestry, genes, origin, cultures and whatnot it's at best a weak hint at some connection with a place with high or low sun exposure and but that could be anywhere and that connection could be anywhere within your extended family tree having more less or no influence on the individual.

Not to mention that even if you picked a more useful feature like genes, diseases/resistances or whatnot. You'd still run into the problem that people within such a group wouldn't be homogeneous enough to make that a useful distinction and members outside of the group wouldn't be distinct enough to make that a useful. And they wouldn't align with the "races" that were made up in the infancy of modern science, either...

So past attempt at defining races were completely misguided and unscientific (by modern standards), currently there is no feature which allows for (realistically numerable) races and whether there is one at all is questionable.

Both scientifically (one cannot find such groups and arbitrarily made up groups aren't useful to gather knowledge) as well as ethically (discrimination, in the sense of dividing a group, usually comes with ranking, hierarchies and discrimination, in terms of treatment).

Also while historically the term "race" was used rather loosely so idk ask a racist and they might make comparisons like vipers and cobras, which are actually quite far apart in the taxonomy. With regards to humans we've got a 99.9% similarity in genome across the species, so at best we'd be talking subspecies. So a difference so small, it's an optional category in the taxonomy. So even if existing that would not be the big thing that early proponents of racism thought it was.

So when people talk about racism, they rarely mean the scientific grouping of people based on actual facts. What they refer to is a set of ideologies that:

Which in combination with the lack of actual scientific backing usually just boils down to "arbitrary" mistreatment of people (in the sense that it might be systemic and based on features, but doesn't relate to any meaningful attributes but is rather fictious).

And furthermore you might use it to refer to political systems and movement implementing or proposing racist goals. Because while individual actions can be motivated by racism or be following unjustified prejudices, unless they tell you it's hard to tell whether an individual is a weirdo, a dick, has a particular prejudice or generalized that prejudice into an ideology building a worldview around it. So you might have ideas such as "racist is an adjective not a noun", condemning an action not a person. While racism as a system would point to a more systemic, ideological and political construct that involves not just individual prejudice and malice, but the political power to define a race (not just for oneself, but for everyone including the people you talk about), to make that definition omnipresent, to assign value to that definition and to act upon that characterization.

So depending on whether people want to fight racism when it starts, it might already be applied to the individual or if they don't want to stigmatize and thus push people into a corner they might use it for organized systems rather than people.

None of that applies to AI which is at best capable of scientific racism, that is finding a pattern within a group.

So the biggest risk in that regard is that what the AI finds is a prejudice rather than a well reasoned judgement. Which is very likely and precisely what it's doing. I mean these are statistical prejudice machine that first and foremost assume that correlation is causation... That's what makes them so successful (that heuristic is better than the opposite, yet obviously not perfect either). So it might often take lots of data and to some extend the ability to build more complex models to avoid those pitfalls.

So AI isn't racist, but it is very much prone to prejudices and one should look at it's results with a grain of salt with regards to that. Which is what a lot of people on the hype train seem to lack.

The other problem is that our data with which we feed it might not be neutral. The thing is, racism claims that the differential treatment of people is a necessity of the differences, but often enough it's the other way around. In that differences between groups often enough are the result of differential treatment. So in some regard it's the racist who creates the race and not the other way around.

The problem is, that if you apply a simple model seeking correlations then it doesn't matter if A cause B or B cause A, A and B happen simultaneously and are thus related. So depending on the data and the model it's very likely that you build yourself a confirmation bias, that takes input with a racist bias and learns how to produce output that looks similar to that.

There's no agenda on the machine part, no political motivation to mistreat people, no perception of what people are and what that data refers to in the first place, but you might reproduce that data and thus create output that can be read by people as confirmation of their racist narrative.

However for obvious reasons, your strategies to approach a racist wouldn't work on the machine. So it might not make much sense to think of it as racist.

Why is pattern recognition not racism?

You are asking a "why" question, implying that the sentence after the "why" is true.

This is immediately challenged by the fact that all major GPT/LLM models contain features that are very consciously implemented by the developers to combat racism or other biases in the generated models.

Specifically, modern LLMs are first trained by processing an unfathomably large number of input documents, but a very important second step is further training by a process called Reinforcement Learning. I.e., the original model is further modified by, basically, feeding it prompts and actively teaching the model which answers are "bad" (be it racist, sexist, or information on how to build bombs at home, or whatever the creator of the model considers inappropriate).

Interestingly, this step is even important for just improving the quality of the model. The example that stuck with me the most is if you only train the model on input data and then don't do anything else, you will get conversations like this:

(Simply because, statistically, most real homework assignments you would find on the net will contain a remark about the deadline.)

In this context, it is important to keep in mind that what we call "AI" these days, i.e. GPTs or LLMs are not in any way, form or fashion understanding what they are saying. Yes, taken as a black box, or in the sense of a Turing Test, they are incredibly good at faking aspects of intelligence; and an astute philosopher might argue that it does not matter how a black box arrives at an "intelligent" statement -- if it does then it is. But still. There is nothing within this particular black box of LLMs that would be capable of having opinions or convictions, or being able to intelligently reason (beyond figuring out what the best next words to output is) or anything where it would make sense to call the GPT "racist".

Labeling it "racist" would just be a short-hand form for "this LLM has been trained without making sure that racist biases in the input data were filtered out in the process". And this is not only possible, but it did happen, it is actively combatted by the devs at very large cost. So yes, a LLM can be "racist" in some sense of the word.

In other words: the LLM contains what you feed in. If you train a model on the top-100 list of known racist books, combined with the extract of only openly racist internet forums, and any kind of racist content you could get your hands on, you will get a "racist" LLM. The LLM is still just a big data structure with nothing whatsoever like a "self" or an "opinion" or "beliefs" or anything along these lines. It is not an "entity" in any way, shape or form. But every answer will be dripping with typical racist statements.

A more realistic case would be if you use, say, a large amount of books from older times, say up to the 1930's, in your training. Open racism and sexism was relatively common back then; you'll find plenty of books from quite successful authors which freely used racist, sexist etc. themes, and few batted an eye at it back then. It was only really since the middle of the 20th century, and obviously in many further places even later (i.e., open segregation in the U.S. or South Africa) that it stopped being generally accepted. This case is especially important since those old books are usually free from copyright (c/f the Project Gutenberg), which is an increasingly important topic in the training of AIs.

All of that will end up somewhere in the LLM. The LLM or rather the GPT (literally "Generative Predictive Transformer") does nothing but, based on its model, generate words that fit what came before. Whatever bias sneaked into your training material, and wasn't cleaned out through later Reinforcement Learning, will be in there, and in the answers, and this would not be accepted by the public (as has been demonstrated in early models which quickly disappeared when there were less than great answers in that regard).

To be racist, one must willingly and consciously assign ethnic traits to a certain behavior pattern, thus accepting it as a reality. The objective of being racist toward someone is, normally, to depreciate, ascertain dominance, demonize, and humble potential adversaries at war or a group of people with a common trait.

Racism is also the act of justifying supremacy by claiming one ethnicity has superior characteristics to others. Pseudoscience and political propaganda play a major role in this scenario.

That said, since AIs are still glorified search engines so far, capable of giving you elaborate, complex, tailored answers and outputs but not passing the Turing Test, they don't have our life experience to truly understand what is to be black, white, and red about how they live, just what a gazillion bytes of data is poured diary on the internet. AIs still need to reach the singularity to gain consciousness as we know. Only then racist AIs may be a possibility, and a real one.

Ask AI
#1 #2 #3 #4 #5 #6 #7 #8 #9 #10 #11 #12 #13 #14 #15 #16 #17 #18 #19 #20 #21 #22 #23 #24 #25 #26 #27 #28 #29 #30 #31 #32 #33 #34 #35 #36 #37 #38 #39 #40 #41 #42 #43 #44 #45 #46 #47 #48 #49 #50 #51 #52 #53 #54 #55 #56 #57 #58 #59 #60 #61 #62 #63 #64 #65 #66 #67 #68 #69 #70