Skip to Main Content

Imagine sending a text message to a friend. As your fingers tap the keypad, words and the occasional emoji appear on the screen. Perhaps you write, “I feel blessed to have such good friends :)” Every character conveys your intended meaning and emotion.

But other information is hiding among your words, and companies eavesdropping on your conversations are eager to collect it. Every day, they use artificial intelligence to extract hidden meaning from your messages, such as whether you are depressed or diabetic.

advertisement

Companies routinely collect the digital traces we leave behind as we go about our daily lives. Whether we’re buying books on Amazon, watching clips on YouTube, or communicating with friends online, evidence of nearly everything we do is compiled by technologies that surround us: messaging apps record our conversations, smartphones track our movements, social media monitors our interests, and surveillance cameras scrutinize our faces.

What happens with all that data? Tech companies feed our digital traces into machine learning algorithms and, like modern day alchemists turning lead into gold, transform seemingly mundane information into sensitive and valuable health data. Because the links between digital traces and our health emerge unexpectedly, I call this practice mining for emergent medical data.

A landmark study published in June, which used AI to analyze nearly 1 million Facebook posts containing more than 20 million words, shows how invasive this practice can be. According to the authors, “Facebook status updates can predict many health conditions” such as diabetes, hypertension, and gastrointestinal disorders.

advertisement

Many of the words and phrases analyzed were not related to health. For instance, the presence of diabetes was predicted by religious language, including words related to God, family, and prayer. Those words often appeared in non-medical phrases such as, “I am blessed to spend all day with my daughter.”

Throughout history, medical information flowed directly from people with health conditions to those who cared for them: their family members, physicians, and spiritual advisers. Mining for emergent medical data circumvents centuries of well-established social norms and creates new opportunities for discrimination and oppression.

Facebook analyzes even the most mundane user-generated content to determine when people feel suicidal. Google is patenting a smart home that collects digital traces from household members to identify individuals with undiagnosed Alzheimer’s disease, influenza, and substance use disorders. Similar developments may be underway at Amazon, which recently announced a partnership with the United Kingdom’s National Health Service.

Medical information that is revealed to health care providers is protected by privacy laws such as the Health Information Portability and Accountability Act (HIPAA). In contrast, emergent medical data receives virtually no legal protection. By mining it, companies sidestep privacy and antidiscrimination laws to obtain information most people would rather not disclose.

Why do companies go to such lengths? Facebook says it performs a public service by mining digital traces to identify people at risk for suicide. Google says its smart home can detect when people are getting sick. Though these companies may have good intentions, their explanations also serve as smoke screens that conceal their true motivation: profit.

In theory, emergent medical data could be used for good. People with undiagnosed Alzheimer’s disease could be referred to physicians for evaluation; those with substance use disorders could be served ads for evidence-based recovery centers. But doing so without explicit consent violates individuals’ privacy and is overly paternalistic.

Emergent medical data is extremely valuable because it can be used to manipulate consumers. People with chronic pain or substance use disorders can be targeted with ads for illicit opioids; those with eating disorders can be served ads for stimulants and laxatives; and those with gambling disorders can be tempted with coupons for casino vacations.

Informing and influencing consumers with traditional advertising is an accepted part of commerce. However, manipulating and exploiting them through behavioral ads that leverage their medical conditions and related susceptibilities is unethical and dangerous. It can trap people in unhealthy cycles of behavior and worsen their health. Targeted individuals and society suffer while corporations and their advertising partners prosper.

Emergent medical data can also promote algorithmic discrimination, in which automated decision-making exploits vulnerable populations such as children, seniors, people with disabilities, immigrants, and low-income individuals. Machine learning algorithms use digital traces to sort members of these and other groups into health-related categories called market segments, which are assigned positive or negative weights. For instance, an algorithm designed to attract new job candidates might negatively weight people who use wheelchairs or are visually impaired. Based on their negative ratings, the algorithm might deny them access to the job postings and applications. In this way, automated decision-making screens people in negatively weighted categories out of life opportunities without considering their desires or qualifications.

Last year, in a high-profile case of algorithmic discrimination, the Department of Housing and Urban Development (HUD) accused Facebook of disability discrimination when it allowed advertisers to exclude people from receiving housing-related ads based on their disabilities. But in a surprising turn, HUD recently proposed a rule that would make it more difficult to prove algorithmic discrimination under the Fair Housing Act.

Because emergent medical data are mined secretly and fed into black-box algorithms that increasingly make important decisions, they can be used to discriminate against consumers in ways that are difficult to detect. On the basis of emergent medical data, people might be denied access to housing, jobs, insurance, and other important resources without even knowing it. HUD’s new rule will make that easier to do.

One section of the rule allows landlords to defeat claims of algorithmic discrimination by “identifying the inputs used in the model and showing that these inputs are not substitutes for a protected characteristic …” This section gives landlords a green light to mine emergent medical data because its inputs — our digital traces — have little or no apparent connection to health conditions or disabilities. Few would consider the use of religious language on Facebook a substitute for having diabetes, which is a protected characteristic under the Fair Housing Act. But machine learning is revealing surprising connections between digital traces and our health.

To close gaps in health privacy regulation, Reps. Amy Klobuchar (D-Minn.) and Lisa Murkowski (R-Alaska) introduced the Protecting Personal Health Data Act in June. The bill aims to protect health information collected by fitness trackers, wellness apps, social media sites, and direct-to-consumer DNA testing companies.

Though the bill has some merit, it would put consumers at risk by creating an exception for emergent medical data: One section excludes “products on which personal health data is derived solely from other information that is not personal health data, such as Global Positioning System [GPS] data.” If passed, the bill would allow companies to continue mining emergent medical data to spy on people’s health with impunity.

Consumers can do little to protect themselves other than staying off the grid. Shopping online, corresponding with friends, and even walking along public streets can expose any of us to technologies that collect digital traces. Unless we do something soon to counteract this trend, we risk permanently discarding centuries of health privacy norms. Instead of healing people, emergent medical data will be used to control and exploit them.

Just as we prohibit tech companies from spying on our health records, we must prevent them from mining our emergent medical data. HUD’s new rule and the Klobuchar-Murkowski bill are steps in the wrong direction.

Mason Marks, M.D. is an assistant professor of law at Gonzaga University and an affiliate fellow at Yale Law School’s Information Society Project.

STAT encourages you to share your voice. We welcome your commentary, criticism, and expertise on our subscriber-only platform, STAT+ Connect

To submit a correction request, please visit our Contact Us page.