With data powering more and more decision-making, questions around the collection and use of data in the public and private sectors abound. While data may seem like the best basis for objectivity due to its quantitative nature, data still appears to be harmful, inaccurate, and biased. Data bias can skew conclusions built on data insights, and in doing so, it can also affect the relationships between the government and its citizens and between businesses and their customers.
Data bias is the exclusion of or preference for certain data elements over others in the same data set, and it can be generated from many sources:
What’s the Big Deal with Data Bias?
Data bias is dangerous, it opens organizations up to liability, and it perpetuates discrimination – at times, to deadly ends.
Amazon’s hiring AI system was one such example of data bias creeping into decision making. In this system the data selected to power the hiring algorithm discriminated against women. Why? Amazon’s team fed and trained their hiring AI with a set of resumes from people who already worked at Amazon. This may not have been an issue had the population of Amazon employees and applicants not been overwhelmingly male and privileged. As a result, the hiring AI was predisposed towards resumes from privileged, male applicants. For instance, when the AI encountered resumes that listed women’s colleges like Smith or HBCUs like Howard, it treated those applicants as less desirable than other applicants who had gone to similarly ranked schools.
Another data algorithm laced with bias: predictive policing intended to reduce human bias in crime prediction. However, it reflected cases of biased applied tools and bias through prejudice. The algorithm based itself on repeated crime statistics and police reports, and these repeated statistics happened to be higher in neighborhoods populated by racial minorities. This led to the targeting of certain populations. Systemic racism could be a possible reason as to why these neighborhoods had more police reports than other neighborhoods. These neighborhoods populated by minorities may have already been areas of high speculation, led to a larger police presence, and consequently, more police reports. Continuously training a machine on this type of data could led to producing a biased algorithm that predicts high crime rates from minority neighborhoods, an instance of biased analytical tools. With minority neighborhoods and police already having a complicated and tense relationship, a biased algorithm could contribute to lethal stereotypes and encourage distrust.
If a system has been built for a specific population and overlooks minorities, the data will probably do the same. Data should be mindfully collected, well sampled, and free of intentional bias in order to best reflect a population.
How to Prevent Data Bias
Establishing a clear, guiding question and parameters to work within, such as the data’s limits in application, helps steer the collection of data and direct the conclusions pulled from it. Objective and meticulous oversight over the process of data sampling and collection helps confirm that the data collected pertains to the question at hand while also ensuring participant privacy. Presenters and interpreters of this information, then, can provide caveats to its application. Additionally, if people provided personal data, collectors could show participants what the data looks like and ask for continuous consent. This can help prevent data mishandling and breaches of privacy.
Another way to ensure proper handling of data comes from the very people doing it. A team of diverse perspectives and experiences can help identify biased trends within data and provide alternate courses of action. Lastly, remembering the limits of technology helps humanize a heavily quantitative endeavor. Considering both long-term and short-term effects of using data and of sharing it with the public can help maintain privacy, provide a data quality check, and connect the conclusions with the original question. After all, even machines and tools may have prejudices, carrying over biases from the environment that built them or from data fed into it. Technology is not the ultimate decider but a tool, and data does not decide but informs decisions.
Preventing Data Bias with ITC
To prevent data bias, statisticians and data scientists can incorporate best practices into their workflow, from beginning to end. At ITC, we take special care with the data we collect and how we use it, applying processes and safeguards to ensure that data serves the people best.
ITC has supported many government clients in the development of data governance – including setting standards to prevent data bias because ITC recognizes that data bias prevention is simply another facet of data governance. This work has included protecting data from inappropriate sharing, automating data flows while adhering to data regulations, setting governance standards to support low-bias data, and many more projects.
If you are interested in joining the ITC data team, check out our current openings at IT Concepts – Current Openings (workable.com)
Works cited
Amazon: Dastin, Jeffrey. “Amazon scraps secret AI recruiting tool that showed bias against women.” Reuters. October 18, 2018. Chief Innovation Office – Amazon scraps secret AI recruiting tool that showed bias against women _ Reuters.pdf – All Documents (sharepoint.us)
Predictive Policing: Heaven, Will Douglas. “Predictive policing are racist. They need to be dismantled.” MIT Technology Review. July 17, 2020. Predictive policing algorithms are racist. They need to be dismantled. | MIT Technology Review