Ethical Responsibility in Data Processing
- Alex Ricciardi
- Jan 23
- 8 min read
The article describes data processing as a curatorial process rather than a neutral technical task, and argues that values and biases are encoded at every stage of the data lifecycle. To safeguard against ethical failures like those seen at OkCupid and Amazon, the article proposes implementing within data processing a framework centered on dignity, autonomy, necessity, and the common good. Finally, the article encourages data practitioners to interrogate the "why" behind every technical decision to ensure that data systems serve to empower individuals and society rather than becoming instruments of exploitation, bias, and harm.
Alexander S. Ricciardi
January 23rd, 2026

The goal of data processing is to convert raw data into useful information that can be utilized for analysis, modeling, and operational action. This process follows a lifecycle that includes six stages: Collection, Preparation, Input, Processing, Output, and Storage (UAGC Staff Member, 2024). These stages are often referred to as “janitorial” and purely technical processes. However, this view is dangerously reductive, as the process is far from a neutral activity or a purely harmless technical exercise. Decision-making is an integral part of data processing, where subjective values can be intentionally encoded at every stage of processing. Boyd, a partner researcher at Microsoft Research and the founder/president of Data & Society, stated, “There is nothing about doing data analysis that is neutral... what and how data is collected, how the data is cleaned and stored... all of this is political” (Boyd, 2017, p.1). This suggests that data processing is not just a purely neutral, janitorial, and technical task but rather a curatorial process with ethical implications. This article argues that data processing is the primary locus where abstract values can be transformed into concrete outcomes. Consequently, this process must be encapsulated in an ethical framework from planning to implementation to prevent the encoding of harmful biases and the violation of human rights principles: Dignity, Autonomy, Necessity, and the Common Good.
The Neutral Tool Fallacy
A common misconception is the assumption that technology is ethically neutral, that its morality depends not on its design, but on how the technology is used. This is analogous to the argument that "guns don’t kill people, people do," a logic that ignores why a tool is designed the way it is. This "why" defines the intent behind the design, embedding specific features into the tool’s functionality before it is ever used. In data processing, decision-making is an integral part of the process during the design and implementation phases, meaning that the choices made during the data lifecycle are not just technical; they are curatorial. For example, when a data architect decides which attributes to gather or which outliers to discard, they are making decisions that dictate what counts as a data signal; in other words, whose perspective is deemed eligible and whose is dismissed as noise or treated as an exception. These curatorial decisions intervene within each stage of the data processing lifecycle, raising ethical concerns or questions such as:
In the collection stage, when gathering raw data from sources (sensors, surveys), what is being measured, what is being ignored, and why?
In the preparation stage, when cleaning or wrangling the collected data to remove errors. Who defines what an error or an outlier is, and why it is defined as an outlier or an error?
In the input stage, when converting the cleaned data into machine-readable formats for a machine learning application. Why are these formats being used? Do they accurately reflect the context, identities, and constraints that the cleaned data represent, or do they diminish the true complexity of it within misleading and simplified categories?
In the processing stage, when running algorithms (AI/ML) on the formatted data to uncover patterns. Why have these algorithms been selected? Do the algorithms prioritize efficiency over accuracy and transparency?
In the output stage, when transmitting or analyzing the final results from processed datasets and models using reports, graphs, or dashboards. Why have these methods been chosen? How is the final results data framed, displayed, or delivered to the decision-maker or the public?
In the storage stage, when recording data and metadata. Who can access the data, why, how long it is retained, and whether it can be repurposed beyond the original consent?
This set of questions demonstrates that when wrapped in a framework of ethical questions, specifically the “why,” the rationale behind each technical choice, the data processing lifecycle is not a neutral tool, as the perspective and intentionality of use are integral to its design and implementation. By interrogating the “why” of each technical decision, a data processing professional can practice ethical responsibility, and failing to ask these questions may inject biases hidden behind technical requirements and arguments.
Ethical Framework for Data Practitioners
To practice ethical responsibility associated with data processing, practitioners must adopt an ethical framework grounded in the preservation of human rights, which is an approach that ethically evaluates every technical data processing decision against four ethical questions:
Does it preserve or enhance human dignity?
Does it preserve the autonomy of the human?
Is the processing necessary and proportionate?
Does it uphold the common good?
(O’Keefe & Brien, 2023, p.35)
This set of questions can be used to encapsulate an ethical framework that evaluates each data processing decision against four principles of human rights: Dignity, Autonomy, Necessity, and the Common Good. Dignity asks whether the processing respects the worth of individuals or reduces them to a simple data point. Autonomy asks whether the processing preserves the freedom and agency of individuals through controls such as consent and privacy protection, allowing individuals to have agency over their personal data, choosing how their data can be processed, if at all. Necessity asks whether the processing can be justified as necessary and proportionate, meaning the processing's benefits outweigh the risks of harm or exploitation. Finally, the Common Good asks whether the processing design and implementation genuinely serve the public interest. In practical terms, this ethical framework questions the intent of data processing decisions by evaluating their outcome, allowing data processing practitioners to evaluate the ethicality of each of their technical choices against their expected outcomes. In other words, the framework asks whether the rationale, the “why”, behind each decision preserves human rights. Moreover, applying this framework could prevent ethical failures such as the OkCupid data release and Amazon recruiting tool bias, which have tarnished both organizations’ public trust.
OkCupid Release Case Study
In 2016, researchers at OkCupid, a popular dating app, scraped 70,000 online dating profiles (including usernames, demographics, and intimate questionnaire answers) and released the dataset publicly to the Open Science Framework for psychological research without asking users or the company for permission (Woollacott, 2016). The researchers justified making the data publicly available on a technicality, arguing that the nature of the data was public as the user profiles were viewable to other users on the OKCupid site. However, this argument is not ethically viable as it violates the Autonomy principle. Although the data was technically, to some extent, publicly accessible, the users had shared their data in the context of a platform for dating services; they never consented to their personal data being used or reprocessed for research or public distribution. This was a violation of both the Association of Internet Researchers and the American Psychological Association, stating that participants have a right to informed consent, which includes understanding how their data will be used (Franzke et al., 2020; APA, 2014). By labeling the processing of the dataset as a neutral technicality and prioritizing it over the autonomy of the individuals, the researchers violate the fundamental human right to control one's own digital identity. It was a clear ethical failure.
Amazon's Recruiting Tool Bias Case Study
In 2014, Amazon engineers started to build an automated hiring software to facilitate the search for top talent by reviewing job applicants' resumes using a supervised machine-learning résumé screening and ranking AI model (Dastin, 2018). However, a couple of years later, the project was scrapped as the AI model was found, by Amazon's own machine-learning specialists, to be very biased against women. Specialists traced the cause of the bias to the training dataset, which consisted of résumés from Amazon tech applicants mirroring the historical male-dominated U.S. tech industry. This introduced a male gender implicit bias within the training of the AI model, resulting in the AI tool penalizing résumés that included the word “women’s,” in terms like “women’s chess club captain.” Such oversight from the Amazon engineers may be attributed to treating the data processing lifecycle’s collection and preparation stages as a neutral technical process rather than a curatorial process with ethical implications. In other words, the engineers failed to adequately clean the data and rebalance the dataset to account for the historical underrepresentation of women. Furthermore, when analyzed through an ethical framework, the engineering team failed to apply the Dignity and the Common Good principles, and by analyzing the “whys” behind their technical decisions. They fail to apply these principles by choosing technical processes reflecting historically beneficial and successful raw statistical patterns, over ensuring that the tool's résumés vetting processes were equitable and fair. Moreover, if the tool had been implemented as it was, it could have resulted in major lawsuits and a loss of reputation for Amazon. For its part, Amazon has failed, for this particular project, to implement a framework within its software development process that implements ethical principles and guardrails. As Kemell et al. (2019) argue, the mere presence of ethical tools and considerations in the design and implementation of software can increase a sense of ethical responsibility within software engineers.
In Conclusion
These case studies demonstrate that data processing is often viewed as a neutral technical process rather than a curatorial process needing to be designed and implemented within an ethical framework. This view is dangerously reductive, as data processing is far from a neutral activity or a purely harmless technical exercise. Specifically, the failure to recognize the ethical implications of every technical decision within the data processing lifecycle allows subjective values and biases to be encoded into the process, often at the expense of preserving human rights. To prevent these ethical failures and biases, data practitioners must move beyond the fallacy that data processing is just a neutral tool by recognizing that every decision to include, exclude, or transform data is an ethical choice. This can be accomplished by encapsulating the design and implementation of each data processing stage within an ethical framework that upholds human rights by asking the “why” behind every technical decision and evaluating them against the four principles of human rights: Dignity, Autonomy, Necessity, and the Common Good. Ultimately, by shifting from a view that labels data processing as a purely neutral, janitorial, and technical process to a view that labels it as a curatorial process with ethical implications, data practitioners can ensure that the systems they build serve to empower individuals and society rather than becoming instruments of exploitation, bias, and harm.
References:
APA. (2014, June 4). APA: Psychologists should obtain informed consent from research participants [Press release]. American Psychological Association. https://www.apa.org/news/press/releases/2014/06/informed-consent
Boyd, D. (2017, April 12). Toward accountability: Data, fairness, algorithms, consequences. Data & Society: Points. Medium. https://medium.com/datasociety-points/toward-accountability-6096e38878f0
Dastin, J. (2018, October 11). Amazon scraps secret AI recruiting tool that showed bias against women. Reuters. https://www.reuters.com/article/us-amazon-com-jobs-automation-insight/amazon-scraps-secret-ai-recruiting-tool-that-showed-bias-against-women-idUSKCN1MK08G/
Franzke, A. S., Bechmann, A., Zimmer, M., Ess, C., & Association of Internet Researchers. (2020). Internet research: Ethical guidelines 3.0 [PDF]. Association of Internet Researchers. https://aoir.org/reports/ethics3.pdf
O’Keefe, K., & Brien, D. O. (2023). Data ethics: Practical strategies for implementing ethical information management and governance. Kogan Page.
UAGC Staff Member. (2024, June 18). What is data processing? University of Arizona Global Campus. https://www.uagc.edu/blog/what-data-processing
Vakkuri, V., Kemell, K.-K., & Abrahamsson, P. (2019). Ethically aligned design: An empirical evaluation of the RESOLVEDD-strategy in software and systems development context (arXiv:1905.06417) [PDF]. arXiv. http://arxiv.org/pdf/1905.06417
Woollacott, E. (2016, May 13). 70,000 OkCupid profiles leaked, intimate details and all. Forbes. https://www.forbes.com/sites/emmawoollacott/2016/05/13/intimate-data-of-70000-okcupid-users-released/

