20 years ago, when I was in high school, we were asked to write a simple algorithm – and backed away once we understood what that meant. Our physical education teacher no longer wanted to grade students according to the absolute height they jumped but for the individual progress they made in comparison to others, also factoring in their body weight, height, and gender. And, being curious and progressive back in the 1990s, when nobody we knew talked about algorithms yet, he asked us math majors to develop a formula for that.
Social justice in sports – how cool was that!? Well, until we felt the awkward responsibility that was bestowed on us. Controlling for gender was easy, but how heavily should we factor in weight and height – how much more difficult was it for a small, heavier student to reach that high jump bar than for a tall slim one? Or was it? Also, would that mean everybody would be asked their weight? And shouldn’t we introduce more variables, like a past knee injury or nearsightedness? We argued, shifted numbers around but it just didn’t seem right, despite the teacher’s good intentions. We refused to arbitrarily write up a golden rule; our teacher would have to continue to rely on his judgment and justify his decisions. For the first time, we had a premonition that mathematics can be destructive if done with exaggerated positivism and keeping ethics out of the equation.
Of course, we had no idea to what extent such formulas would be used and abused a few years later – and most often not with the goal to maximize fairness but profits. Our little formula would have done little harm in comparison: Some students might have felt treated more fairly, but others might have received undeservedly bad grades. And some might have started to tweak their data – more weight, a lower previous result – to make their progress look better and harder earned.
Two decades later, all high school students know how ambivalent algorithms are. The US credit score algorithm, for instance, traps people in a circle of poverty and reinforces social inequalities in the US as discussed in my post last week. But why do so many algorithms discriminate and what we can do against it?
Dirty data can lead to algorithmic bias
Even with good intentions, an algorithm can be biased if the data that it is trained on is flawed – incomplete, unbalanced or not well selected. And this is not a rarity: If you train an algorithm on everything written on the Internet – as has been done with Google Translate – it will develop a prejudice against black people and women according to one study.
Header image: (c) Max Duzij on Unsplash