Discussion: Data Monopolies (May 2020)

Jan 18, 2021

Kelsey Breseman, Data Together

Most of our data and information is controlled by a handful of companies. How did this come to be, what are examples of responsible and irresponsible holding of this power, and how do we imagine we might slip the trap of data monopolies?


Anticompetitiveness, and how did we get here?

What are the things we worry about with monopolies?

Data monopolies in a COVID era?

Optional bits to play around with for discussion:

A tesselated glass structure

Photo by Anna Gru on Unsplash

EU vs US interpretations of monopoly

JONATHAN: In today’s discussion, we want to be able to ask what is good and bad about a monopoly. In order to do that, we need some grounding: how did we get to where we are?

When we think about the modern application of, how do we treat things like a COVID response, coming up with, what are the things that we’re trying to protect against? Which parts of this tech application are good, and which are bad? Is data centralization the same thing as monopoly? Are there bounding conditions on when that might be necessary?

Our grounding reading selections were meant to show ways that the EU and the US have historically viewed monopolies and anti-competitive work. I think both views of the world center on, we want to do things that are good for consumers. The definition of what “good” is, is where things are fuzzy.

My understanding is that the way that the way the US has historically approached antitrust has changed pretty significantly. I think it was Milton Friedman who proposed that the way that we should consider antitrust is from a very consumer-driven lens: if it’s not hurting the consumer, then why do we care?

Largely, I think the legislation has been passed in the US and the way that it’s been pursued in courts, has been around price. When we talk about consumer harm, we’re talking about a specific lens of consumer harm. We’re talking about it from this one imperfect metric: did you pay too much or not?

There are functional differences in the law as well; the European law is not written as specifically. The New Yorker article talks about how the two key operative words inside of the EU legislation are “for example”. The law says, “this should not hurt consumers, and for example…”, there’s a number of things that are listed. By saying, “for example”, that law gives the European courts a lot of flexibility when they define what has actually violated consumer need, or what is disadvantaging consumers.

In the American lens, we don’t necessarily price in externalities. Are we truly pricing in all the other forms of harm? European courts have interpreted “harm” to include fewer options. Might it also mean violations or potential violations of civil liberties? How about potential national security risks: if Twitter, for example, has too much data, and Saudi Arabia pays a Twitter employee $300,000 to leak information, is that dangerous to me?

KELSEY: That’s a really great summary. My impression was that, in both US and EU interpretations the intent of antitrust is to not stifle innovation. But then it was a question of, innovation to what end? The American interpretation ended up being, innovation to the end of consumer protection, versus the EU take is maybe innovation as an end in itself.

Contrast that to that Brandeis quote from the Goliath reading, “Anyone living under a monopoly, subject to the whim and caprice of a few self-appointed industrialists, is not a free citizen,” versus this somewhat opposite concept, which is much more Friedman, that regulation of any kind is inherently bad and stifling to free democracy.

The scope of monopoly

MATT: To me, the EU/US divide is useful to hear, but sort of uninteresting. I don’t care what’s been instantiated as law in different jurisdictions. I am interested in what models exist for describing: what are the ills that can result from monopolistic capitalism, or just from monopolies in general?

It seems to me that there are two main conceptions at work in the conversation so far. One is purely about consumption: is there an economic penalty that we pay for the existence of monopoly? The other has been described here as innovation, but it seems to me it just has a broader conception of what bads result from monopolies: is there some form of social good which is sacrificed when a monopoly comes into being?

That social good could be economic in a broad sense—so, it could involve innovation. Or it could have a less well-defined ill effect, a kind of moral effect, on how we live, how we think of ourselves, how we relate to one another in a society; Brandeis says it has some effect on freedom, whatever that means.

It seems to me that this broader conception is the one that’s interesting. We wouldn’t be all up in arms about the monopolies that Google and Apple exercise for the sake of $7 a month. I think we care because of the idea that there’s some moral harm that comes to us as a result of living under a monopoly.

BRENDAN: I think you make a lot of really good points, Matt. But I think that the specifics of these two interpretations are really important.

In the traditional interpretation, capitalism is a part of functioning society; unbridled capitalism is a problem—so we need some sort of referee. The existence of antitrust as a category of regulation is all an admission that there’s a thing that needs to be brought into balance in this capitalist equation. I think the framing of “consumer good” that you see in the US is Exhibit A of where a data monopoly is, as Jonathan said, a failure to price in certain problems.

The particulars of the laws and antitrust cases are a really interesting point for departure, because they get you back to the big conversations really quickly. They’re the result of a lot of thought, of someone trying to reduce down those big ideas.

If you read any of Jeff Bezos’s letters to his investors, it’s a magnum opus on antitrust. He’s a masterful mind when it comes to designing something that evades the current understanding of antitrust in the world. I think the EU’s interpretation has been far more broad, but it is also continually accused of being nonspecific.

JONATHAN: The Getty versus Google case is an interesting example. There is a clear harm, from Google Image Search, to Getty and the artists who use the platform. Google is enabling the piracy, whether willfully or not, of a bunch of information. They’re stealing ad revenue, too, from Getty, and that’s causing material harm to all the people up that chain. And so of course the EU does what seems like the right thing and enables Getty to pursue and win a case against Google. And yet that leads to “outrage” from people who just want to access images.

Defining data monopolies

PETER: Is there a definition of data monopoly that we agree on? Can you give examples of what feels like a data monopoly to you?

MATT: One thing that Jonathan was discussing was the geographical data sets owned by Google and Apple of how people move, as tracked by their phones. “Monopoly” is maybe a little too strict a term, but these are non-competitive databases for two sets of users. The possession of those datasets confers to the owners a set of privileges, that market position in two ecosystems—that’s an advantage multiplier. They have these market positions that are already well-entrenched, and the construction of their services around data over which they and only they have control is a way of rendering permanent their market advantages.

JONATHAN: The idea of a data monopoly, as I am currently defining it in my head, is taking the existing ideas of monopoly and applying them to the data context.

Ben Thompson talks about this idea of aggregator theory. He applies it across several tech platforms. There’s a point that you made that I think really nicely intersects there. You have this ephemeral thing that you could think of as the social graph, how we know each other and what we know about each other. That’s not something that is owned by Facebook. But by us all having put in that data into this one organization, Facebook is now claiming some ownership of the maintenance of those relationships.

There’s market potential where Facebook can cut off access to other apps, and from other aspects, economic impacts for companies that can or cannot exist or need to exist within the rule of Facebook. But I think there’s an underlying issue, which is not a pricing issue. This is a power issue. As more people adopt the platform, more of this data is pumped into this one monopoly, and if we don’t challenge the notion that this data should be owned by an organization, then I think there’s questions that fall out. What are the restrictions on power? Or do they have unbridled power?

When I think of data monopolies, I’m thinking mainly from the user context, and that ends up boiling down to a handful of very specific tech entities.

BRENDAN: I have a definition of a data monopoly that I personally use, and it’s more boring, less useful, and has a bit of a higher bar. The way that I understand this problem is by asking, is this a monopoly, or is this an economy of scale? If we consider the original notion of monopoly, you could say Apple has a monopoly on the iPhone, or Apple has a monopoly on great laptops. But no, they have an economy of scale. It’s very expensive to spin up a company that makes laptops, and they’ve figured out a way to get there. I think that’s healthy innovation.

A classic monopoly is when an actor in a market leverages their position to suppress competition.

I subscribe pretty heavily to the belief that all data, at some level, is an observation of the real world, structured in some way and jammed into a database somewhere. I think that’s really important when you talk about a data monopoly, because I think for something to constitute a data monopoly, the actual correctness of the data must reside with the owner of the data.

Using that model, I don’t think that Facebook’s social graph is a data monopoly, because at the end of the day, my friends are my friends and I choose who they are. If I ever logged on to Facebook and said, unfriend this person, and Facebook said no, or if I discovered that they continued to advertise us as friends elsewhere, I would be very frustrated. I might leave Facebook. Facebook is not the arbiter of that truth.

I think it should be a very high bar for someone to say that you have a data monopoly. Monopoly should imply that, if we change this number in this database, you are no longer friends. That should be the bar for a social graph monopoly.

Monopoly as a lack of alternatives

JONATHAN: Does Amazon have a monopoly on online sales, functionally? You could argue, no, you could easily stand up your own website. But what percent of ecommerce is going through Amazon? I think there’s a similar effect. Even if you, as the user, get to modify and update the data, there is a question to: what is the ability of a competing social network or anyone else to actually create that data asset? And does Facebook by its mere existence negate that network effect?

Facebook takes anticompetitive actions. What does Instagram do when Snapchat is about to release something? Rip it as quickly as possible. There’s also this lock-in mechanism, which is anti-user but pro-Facebook. If you were to say, I need to leave Facebook, how do they enable that? You do own the correctness of the data, or if you were to turn off your profile, I think with GDPR, there are more rights associated with you now. But there are still questions about, what has already been scraped and pulled elsewhere? Clearview AI has been in the news as super creepy on this front. Do you even have control at that point?

Maybe it is an economy of scale, but at a certain functional scale, you suffocate everyone else. The act of you building out your social network, again, is pretty high.

And then, what sort of choices do we then subscribe to if Facebook has the ability to keep most of us on there and retain this information? What algorithms or value choices are made on our behalf? All of this stems from, they maintain the data, and they have an ability to make it harder to migrate off of their platforms.

PETER: In relationship to Facebook, the biggest thing is this feeling of loss of control. And it isn’t just about data, or representation. It’s about to what extent we can shape our social relationships.

Where do our intuitions for monopolies come from? For me, it’s the company town, the company store, and your ability to choose alternatives. It’s pretty clear what an alternative to the company store might look like. But what service is Facebook providing? Each of us actually does use it quite a bit differently. It plays a whole bunch of different kinds of roles in social life, in organizing community events, in getting your brand out there. So it’s really hard to think about. I think that for any one of those, people might be able to bring a case about how they don’t have a real alternative. Is data central to that? Is data the unit of control?

That sense of control is not just the data itself that I put in there, the particular records or interactions, but my ability to shape those interactions. The timeline is super disempowering, because nobody knows how it works. If they shape the UI in a particular way and that impacts how I can interact with the world, I have no idea what’s going on. I can’t say, show me all of the coefficients for how much you think I like this person so I can tweak them.

Monopoly as owning the source of truth

BRENDAN: I would posit that your social security number, your credit score are better examples of condoned monopolies. If someone changes your social security number, your life is over. Whoever controls the association between your name and your credit score, that is where there is an inversion between what’s real on the ground, my actions, and a figure in a database. A modification of that database has horrific real impacts that impact the real world.

With a credit score, there’s a truth that is intrinsic to the world that we’re trying to model with the database, your financial reputation. Actually measuring the raw thing is impossible. And so we use these proxies. We write down those proxies, and then we rely on the people that keep them. So we get these chains, from a thing that is true in the world, to people who are in the business of writing down that thing and becoming a proxy for truth.

To me, those are easier examples for understanding what a monopoly is. I think it’s really important to cleave apart something that can be frustrating or bad or an economy of scale from where we need to step in from the regulatory perspective. Antitrust action should be a really “break glass in case of emergency” moment, in my mind. There should be other mechanisms to fix stuff like Facebook being terrible.

Monopoly as exclusive power

KELSEY: I checked the dictionary online in the meanwhile, and a monopoly is “the exclusive possession or control of the supply of a traded commodity or service”, which is really not what we’re talking about here.

Brendan’s definition definitely talked about active suppression. That leads me to the question: is it possible then to accidentally or incidentally be in a monopoly? I think yes.

Then there’s the pragmatic definition that you can get out of the Getty lawsuit. They had this great quote. The Getty folks were suing Google over whether they could make an image available in HD; Getty was saying, nobody visits our site anymore because your site just finds our stuff and surfaces it to people better. Google said, well, fine, we can exclude you if you want, you can either be on our site or not be on our site. The Getty response was that that was no option at all: “allowing the harm to continue, or becoming invisible on the internet”. And that’s a pretty great pragmatic definition of monopoly: you do it our way, or you don’t exist.

MATT: Google has monopolistic power as a search provider. Fundamentally, Google has a gigantic database of links and how they’re related to each other, which is a form of power over data. It leverages that power in lots of different ways, in this case with Getty, to force conformance to its parameters.

PETER: Data is a resource; the market is publicity. In the case of Getty versus Google, Google has a bunch of data that protects it from competition by other search engines. Getty’s problem is one of publicity, and Google is the only way that you can get that. It’s Google or nothing. So those are two different issues, or issues at two different levels.

Data centralization for COVID contact tracing

JONATHAN: One of the things that we may or may not see in this COVID era, are applications for having these organizations with such a vast reach and pretty comprehensive ability to use the data how they need to, has led to some positive social outcomes. Facebook is publishing data about whether people are maintaining quarantine. Google and Apple are able to leverage those data assets to create this large-scale contact tracing thing for the world, which is amazing.

KELSEY: For varying connotations of “amazing”.

BRENDAN: I have done a bit of a 180 on blockchains in the world of Coronavirus. For me, the frustration with blockchains has always been that they seem like a lot of work for very little payoff. But when we have these immutable sources of information, and we’re going to put some intermediary between us and that thing, we now have a very pressing need for accuracy on this.

We need to be able to contact trace, to actually understand how a virus is propagating through the community. This is a situation where the freedoms of the individual and the freedoms of the of the group are in direct, diametric pressure. But we don’t want to end up in a situation where some one person or entity becomes the arbiter of contact tracing.

We should be using a decentralized ledger for this kind of thing, and participation in this should be mandatory. Many of the existing solutions just don’t cut it. This is a lot of material, it will affect some people’s lives, and the disposal of that information is really problematic.

MATT: The problem that I see as obvious is privacy with healthcare data. Blockchain allows you to validate a claim, but it doesn’t really help you keep things private, does it?

BRENDAN: It can be used to mean that the number of actors in a space cannot be totally known. It’s basically like a network of only social security numbers that can change but have to change in a causal way. This isn’t a blockchain in the classic sense.

I think that it’s important to highlight the problem here. You have an immutable source of information, and then a lot of entities who are vying to be a central arbiter of truth. And to me, that is the definition of a data monopoly, that we’re being asked to grant a monopoly over a sort of truth-granting style of information.

Anybody can join the mining process of a blockchain. So anybody can start participating in a blockchain that properly hides your identity, but allows you to control the disclosure of information.

Monopoly as self-increasing power

KELSEY: If we say that a monopoly is defined as control of the sole source of a particular thing, or type of thing…. an exercise that I would like to do sometime is to chart out the parts of me that Google has come to own over the years. The most dramatic of those, I think, is when they bought Fitbit. It’s something like, you have been reading my emails for years, know who all of my contacts are and where I go based on my phone, and you’re probably in my home, listening to every word that I say, and there’s a camera. And additionally, now you know my heart rate every day at every moment.

If we take all that, and feed that into a machine learning model, the number of columns of data they’ve added means that they can make categorically different models of people than anybody else. I’d like to posit that as a quite different definition of data monopoly.

If I were purchasing models of people, Google would be the only supplier of this complete of a model.

MATT: Why isn’t that just because they have a better product than other people?

KELSEY: Isn’t that where monopolies come from?

MATT: I don’t think so. A lot of them come from guns and dollars. A lot of the oil monopolies come from suppressing various kinds of dissent over the course of the late 19th and early 20th centuries.

PETER: But is doing something really well, or having a lot of power in that sense, a threat in a similar way to a monopoly?

MATT: This is where I really think that the economic origin of the term monopoly is unhelpful, because I think that the problem that monopolies pose is a concentration of power which is difficult to dislodge. It’s a concentration of power that by its nature increases. It arises perhaps apart from the market position, and so it maybe has its origins in economic phenomena, but the problem that matters the most is the increasing scope of the corporate entities' control over your actions.

What we’re wary of, I think, is this sense of a positive feedback loop for corporations operating on a certain scale, with data at a certain scale, that makes it increasingly difficult for alternatives to emerge, whether those are other companies or other ways of life. That’s what seems to me like the evil that we’re concerned about.

BRENDAN: I like your framing a lot better, Matt: how do we dislodge people who have ended up in positions of power? I think that the “data monopoly” term that we initially came up with and built a whole reading group around puts an economic slant on things, that creates this self-disqualifying thought loop. Every time we’ve gone to talk about a data monopoly, we want to talk about the sense of powerlessness.

There is a point where economy of scale tops out: Google gets so deep into your life that nobody else could ever amass such a complete picture, because of these positive feedback loops. It becomes something that could not ever be replicated and worsens.

But I think that’s not nearly as important as the question that you’ve put forth, Matt: how do we have a framework for understanding a threshold at which it feels like someone has amassed a picture of you that you are now beholden to, instead of it being beholden to you?

KELSEY: The “can you walk away from it” metric?

PETER: I want to categorize the data a bit: there’s my information. It’s the things that are my interactions and things that I say, my emails and so forth. Control over that is one bucket. There’s information about me and how I’m seeing, and then there’s information that’s basically a commodity: whether it’s Getty Images, or Google’s indexes, or Amazon’s listings, things that probably could be reproduced, things that have been published, public observations about the world, Google Maps and so forth.

Each of those things is its own domain. The concerns are very different, and the tools we have available are very different. Your questions about the role of open source, the role of government, the role of trust in non-governmental organizations like Google and Apple—I’d love to get a chance to talk about the more personal stuff.

Disrupting data monopoly as an individual

BRENDAN: One idea for a heuristic: every now and then, I’ll try and change my behavior really aggressively on the internet, including writing bots that will go to very different websites, or do all kinds of weird stuff, just to see if I can shake the ads, the bucketing I’m in. It’s purely for fun, but it’s fun to see how easy it is to get the machine to think that you’re somebody else, and it makes me feel really re-empowered. That feels like a guerrilla tactic, and a last resort to what is admittedly a sense of helplessness in the face of a very complete picture of who I am as a person.

KELSEY: That comes back to last year’s reading of “Obfuscation: A User’s Guide to Privacy and Protest”: guerrilla tactics to shake profiling.

PETER: To me, those are issues of privacy and autonomy. Are you saying those are more interesting than data monopolies? Or are you saying that those are related to the ideas of monopolies?

BRENDAN: I think it is related; I think we have the framing wrong. And then I want to talk about that sense of helplessness. Where does that come from? And how do we take it back?

PETER: I think that’s the drive behind the whole decentralization wave, the desire to cut free.

JONATHAN: I think there’s an interesting test for this. For those of you who use Apple devices, do you feel like Apple has a monopoly on your data?

I don’t feel the same way about Apple that I do about Facebook and Google. There’s definitely a lock-in; there’s a running joke with my roommates about how we will not let someone into our apartment who is not on iMessage because we can’t have the roommate group chat. But fundamentally, I don’t feel the same lack of control over my information.

PETER: The lock-in, in Google, Facebook, and Twitter’s cases, is actually making things free. It can take all the oxygen out of the system.

Manipulation through profiling

JONATHAN: Maybe it is business models versus motivations. When a business model is aligned with ads and the access of your information, they will make more money if you stay on. Now they have a motive to continuously engage you, even to the point of psychological manipulation. Apple, while overpriced, is very direct. You pay money and that’s the end of the transaction.

BRENDAN: I think that’s a really important characteristic. It seems that your belief that you are not being dominated by a power structure is tied to your faith in the institution that you are participating in. Fair?

JONATHAN: I think so. There are implicit choices built into the technology stack based on how they came to be.

As an experiment, I signed up for Facebook’s ads platform, just to see what I would have access to if I was an advertiser. You can put in a bunch of keywords and specify a geography and stuff. But Facebook now lets you see what they think they know about you, and it’s wildly off.

In this black box, people get really worried about all the ways in which this can be misused and used incorrectly. And especially when the business model would skew heavily in the favor of Facebook, the business model would be benefited greatly by abusing that trust.

KELSEY: We’re all kind of ribbing on how the targeting is honestly not that good. But I think it was “A Human Algorithm” that was making the point that just because they’re not good at it now, doesn’t mean that the huge amount of data that they have and are continuing to gather won’t become usable in much better models. And in fact, that is the most likely outcome.

MATT: Perhaps the term “data monopolies” is not so useful; and maybe what we really care about is the coercive power of data stores. Maybe there’s something to be said for the way in which the distribution of that data creates coercive power. If you’re the principal actor that has access to this particular data store, then you have the capacity to coerce the market, but also to coerce people into certain kinds of actions.

I think a lot of our thinking in the last three or four years is influenced by this idea that there’s this kind of dark nexus between experimental psychology and fine-grained profiling that creates a manipulative power that is incompatible with democratic practice and also incompatible with the Enlightenment, with the idea that it’s possible to to create a state of free individuals who are capable of autonomous rational thought. I think that there’s something about data concentration that undermines that.

Probably that’s something that’s along the lines of what we care about with data monopolies, but “data monopoly” is not actually a great term for describing that.

BRENDAN: The notion of the dark nexus is the capacity to build a constant personalized hallucination, right?

I’m thinking a lot about the right to be forgotten. We are very much at the dawn of a place where people are starting to amass these profiles, and we’re in new territory. We don’t know what the effects of this level of knowledge totality or perceived knowledge totality really are.

At some point, you realize that the database that accumulates of you is the sum choice of every decision you’ve ever made on any platform. Every time you choose to post, every single time you choose to disclose something, it is this little karat, and they continuously accumulate in a unidirectional knowledge acquisition. That is a very scary thing when you look backwards, when you realize that you have no capacity to control the tail of this thing.

We, as a species, forget stuff. I don’t remember every bad interaction I’ve had with everybody in the world. Databases don’t have that forgetfulness property when maintained properly. That’s a very cognitively difficult thing to get your head around. It’s really hard to understand what it’s like to live in a world that is permanent in some ways and ephemeral in others, depending on where you are, who you’re speaking to, and how you are communicating.

I want the capacity to turn to that memory machine and say, I would selectively like you to forget about this. I want a law that says that I as a rational person know that this is a one-way thing and I will suffer some loss in service quality for it, but I can choose to do it anyway.

Data Together is a community of people imagining a better future for data. We engage in a monthly Reading Group on themes relevant to information and ethics. Participants’ backgrounds range decentralized web protocols, data archiving, ethical frameworks, and citizen science.

This reading group is something your own collective can do too! We encourage you to draw on our notes for this month’s topic. Our notes list readings, call out themes, and suggest discussion questions.

This blog post is derived from our conversation, but is not a replica of it; we rearrange and paraphrase throughout. You can view the recorded call here.