Why are We Talking About Bias?

The views in this article are my own and are based on my experiences. If you disagree, I hope you will add a comment so we can have a healthy discussion and all learn more.

In the previous three articles, we have covered how bias influences us personally and how we must be aware that bias is neither good nor bad. That comes from the context. A Woman walking home late at night clutching her keys between her knuckles to fight off a bad man is inherently the same as an interviewer turning down equal candidates based on gender [https://www.linkedin.com/pulse/considering-bias-healthy-discussion-geordie-consulting]. Our second article examined how bias moves from personal to group context, exploring how that becomes recursive (who knew a coffee shop could be so…) [https://www.linkedin.com/pulse/dynamics-group-bias-geordie-consulting]. Finally, we looked at the impact of group bias when it starts to rule an organisation (remember, personal bias is shaped by group bias) [https://www.linkedin.com/pulse/impact-organisational-bias-geordie-consulting].

So why do we, a Consultancy dealing with the Microsoft Analytics stack and Power Platform, care about bias? The truth is that because bias shapes all it touches, it shapes analysis. One concept at the core of what we work to impart to our clients as they look to become more data-driven is entities or objects. The Kimball schema (Fact and Dimension tables) that form the basis of enterprise-grade solutions must be protected from bias. They can also highlight where bias is at play.

No alt text provided for this image

So let us look at what we may want to put into a Person table (I’ve seen similar for Customer, Member, and Staff tables) within organisations. We can all agree that the Name and Contact details are reasonable, but quickly, this can take a turn. Is Gender relevant for most of the analysis (if not all of it)? Is Ethnicity bringing value? It is easy to say, “It could be important, so we should have it”, but what happens when you seek to identify correlations? These are, after all, the cornerstone of analytics. find a correlation, and you can then extrapolate the linear equation for that model and make predictions!

“Correlation does not equal causation!”

The assumption taken from correlations is that they show a trustworthy relationship between two factors, like a cause and an effect. That “cause and effect” can then be used to extrapolate further “insights” based on the initial finding. It is also not uncommon for effect to lead to cause. The phrase “All blondes are thick” is an excellent example of group identity leading the scenario where “Blondes” in these scenarios are not all natural, and the population being examined is never the whole; instead, it is a subset doing something to “offend” the person making the analogy. So perhaps the saying should just be “Young and dumb.” Looking back at my life, I see a lot of “dumb” when I was younger… I was less blonde (except for that short-lived period at University). The issue would be that if we had “Hair Colour” in our Person table, we would look for correlations between that and any other tables we added. Is it appropriate to answer the question, “What colour product do black men most buy?” Most importantly, if our analysis did find that, would we have a causal relationship so we could say, “If you want to attract more Black Males to your retail environment, make sure products are this colour?”. The madness becomes apparent quickly, yet that is precisely how all too many stats, including government stats, work. Bias builds the tables that are then used to find correlations, which are then deemed causal and used to extrapolate the next set of findings.

As our data models become bigger it is all too easy for seemingly trivial attributes (First Name, Gender or Ethnicity) to shape our analytics. You can after all only become a data driven organisation when you recognise the traditional biases that your organisation applies and are prepared to take an open mind to them. Remember just because a relationship is not causal does not mean it doesn’t exist. Power Companies know that at breaks in popular soaps there is a power surge as people put their kettle on to have a cup of tea (at least in the UK), does that mean that every power spike in the UK is due to soaps? I hope you would say “No!”. Yet it is possible to gauge how popular different showings of soaps are from the local spikes.

 

#geordielife #dataliteracy #datadrivendecisionmaking