Data Together is a community of people building a better future for data. We engage in a monthly Reading Group on themes relevant to information and ethics. Participants’ backgrounds range decentralized web protocols, data archiving, ethical frameworks, and citizen science.
This reading group is something your own collective can do too! We encourage you to draw on our notes document template for this month's topic. The template lists readings, calls out themes, and suggests discussion questions.
This blog post is derived from our conversation, but is not a replica of it; we rearrange and paraphrase throughout. See the recorded call for the full discussion!
This month's reading selections begin with traditional notions and practices of stewardship: Pastor Henry Wright's sermon The Stewardship of Time; selections from Haa Tuwunáagu Yís, a collection of Tlingit narratives compiled by Nora Marks Dauenhauer; and Tending the Wild by Kat Anderson, a view into Ohlone land stewardship practices.
We then touch on present-day data preservation principles: Theory and Craft of Digital Preservation by Trevor Owens; and the Society of American Archivists’ definition of “post-custodial theory of archives".
We continue with two different studies of modern day institutions: an ecological research science network (Enriching the Notion of Data Curation in E-Science: Data Managing and Information Infrastructuring in the Long Term Ecological Research (LTER) Network by Helena Karasti, Karen Baker, and Eija Halkola) and a university library (Post-Custodial Archiving for the Collective Good by Hannah Alpert-Abrams, David A Bliss, and Itza Carbajal) on the continuing challenges they face and how they deal with data stewardship.
Finally, we finish with selections from Nadia Eghbal's report Roads and Bridges: The Unseen Labor Behind Our Digital Infrastructure on the problems facing open-source software and how to sustain digital infrastructure.
These readings explore the definition of stewardship; proper use as a form of caretaking; longevity and the need for institutions; the necessity of curation; and care for the maintainers and stewards themselves.
Please see the bottom of the post for a full list of reading selections.
KEVIN: Before we talk about data stewardship, how do each of you define “stewardship”? Where did you get those ideas from, and how did they evolve?
For example, my religious background was my first exposure to the idea of stewardship.
From the Wright sermon:
Stewardship is the management of various assets that don't belong to you … authority, responsibility, accountability … faithfulness is the minimum requirement for the steward.
BRENDAN: For me, a steward is a librarian, an archivist, somebody with knowledge of library science.
Stewardship has a deep connection to the notion of availability. A steward can keep something accessible and available for a long time. Someone who is stewarding something is taking on a very serious commitment. It's a real responsibility to be a steward of an artifact.
MAUVE: Stewardship is pretty new concept for me. I don't have any cultural knowledge of stewardship.
KELSEY: I've usually heard the term “stewardship” in two contexts:
One is in Montessori. In Montessori classrooms, everybody has a role in taking care of the space. There's a time reserved at the end of each day to sweep, roll up mats, put away materials. This sends a very clear message: this is your space, and you need to take care of it.
My other context is land stewardship. My dad spent his whole career in timber and timber sales for Alaska Native corporations. These native corporations are set up to steward the resources of the tribe in perpetuity. Tribal members are shareholders, but it's not just an economic relationship. It's also supposed to provide for recreation, spiritual practice, forage– forever.
ERIC: Kelsey raised the issue of ownership when it comes to stewardship. According to the pastor in our video, a steward is not an owner. A steward is taking care of somebody else's stuff.
The pastor's comments define different dimensions of stewardship: ownership, accountability, responsibility, authority. Stewardship involves taking care of something so that others may benefit.
ROB: The sermon captured my concept of stewardship: taking care of and being responsible for somebody else's stuff.
Motives for stewardship
KEVIN: I liked that people touched on ownership. I was chatting with my neighbor, who is an Episcopalian pastor. They had a great term, that it's the “antidote to dominion”. Instead of lording over or controlling things, a steward is involved with the proper care and use of things.
In the Tending the Wild reading, Anderson writes,
…one gains respect for nature by using it judiciously. By using a plant or an animal, interacting with it, where it lives, and tying your well-being to its existence, you can be intimate with it and understand it. (xvi)
In the LTER paper, they talked about how scientists in the field actually like taking notes on paper. It's more visceral. By doing so, they have a better sense of the data and its context as opposed to just numbers in a spreadsheet.
How do we interact with data on a more visceral level? More importantly, how do we get the public, the people we're trying to reach with our technology, to use data in such a way that they will want to steward it?
ERIC: We often think of caretaking as: don't touch it, don’t use it. Anderson, in Tending the Wild, is making the exact opposite point: in order for something to be stewarded, it has to be used. You tie your own well-being into the resource you are stewarding.
That comes back to this question of incentives. Incentives aren't necessarily monetary. How do you incentivize participation in stewardship? How do you get users to see that their own well-being is involved in the care of this data?
KELSEY: I've been thinking about Facebook in this context. We don't like how Facebook handles our privacy and how they monetize our personal data. On the other hand, they keep and show you things you might not have thought to keep. They bring up moments you might have forgotten.
They've created an incentive structure for themselves to become very good long-term stewards. They keep and curate a great deal of data. But they're also very bad stewards in the sense of doing it in a caring way.
BRENDAN: As soon as my brain hears “incentive models,” I hear, transactions, blockchain, economics. That's one interpretation of an incentive model: financial.
Another approach to incentivization is stewardship through dependency: I rely on this data, therefore I will steward it. That's in the same realm of logic as barter systems.
And then there's a third space. The best articulation of it that I've heard is as a civic angle. We're articulating a moral duty to participate in preservation of information.
A classic steward– say, a religious figure keeping and disseminating a text, or an archivist maintaining a collection– doesn't depend on the things they are keeping. They feel a duty.
That space is harder to talk about. In efforts to create a post-archival or post-steward universe, our systems are often wholly in blockchain exchange, or a completely barter ecosystem where we're zealously trading– say, lines of code in exchange for a hat, we still have this middle space: I agree to preserve this because I have a duty to participate.
In trying to articulate why people should change their behavior, along the way are we talking about digital citizenship? Should we be thinking more broadly about this whole conversation?
Conscious versus unconscious data stewardship
MAUVE: One cool thing is when people are stewarding it without even thinking about it.
Secure Scuttlebutt (SSB) is amazing in that everyone acts as a steward for their friends and the friends of their friends. It's completely organic.
You don't have to opt in to storing the data, or even thinking about data. It's on the protocol level.
SSB is a gossip protocol in which data is hosted by users, who share one another's data based on whether they are “friends”. Learn more on SSB's concept page for their gossip implementation.
I use this data because it's the digital aspect of my social life. But by using it, I am stewarding it. I am the owner of the data, not a company like Facebook. You gain access to it through my stewardship.
In SSB, people create data, and share it directly with each other as they use it. I think that's really powerful. That's a way we could go forward.
ROB: Pushing the concept of stewardship down into the protocol is good in that it makes it easy. It’s core to its use. But I worry about the ways in which that could hide stewardship from conscious consideration.
That is a big part of what's different in the distributed web: stewardship should be conscious. You should think about where it physically exists. A thing only exists if it's well stewarded by multiple people.
I think it's critical that if we have systems we want to work well, that gets surfaced somehow.
BRENDAN: From the reading on Post-custodial Archiving, there's a line that just jumped off the page:
The continuous custodianship of material objects can no longer be the focus of archival practice. It becomes necessary to shift traditionally archival labor to record creators. (6, Alpert-Adams et al.)
SSB seems to have that. It pushes that labor down into the protocol level by automating the dissemination of data. By virtue of viewing a thing, you are actually seeding it.
That comes with some complications, particularly with respect to scale. A lot of these patterns work if we're not taking up all of users’ hard drives– and when we're not violating a user's expectations.
Challenges of scale
MAUVE: Large quantities of data are a major challenge on a distributed web. Even one ginormous dataset makes the system much harder to take care of– for instance, the scientific data from the Large Hadron Collider (LHC).
At that point, the data is not really being shared on a person-to-person level. Higher-level organizations are needed. You're not going to have people stewarding it as individuals. But it isn't people-type data, so maybe that's okay?
ROB: I think the concerning issue isn't so much scale, as it is scope.
Scale may be more about commensurate scale: I think you're right, Mauve, that it's not feasible to expect individual users to also be stewards of the LHC's data. But there are other institutions like the LHC that have the same needs of co-work and co-stewardship.
The problem is the aggregation of 5 billion users’ data (“people” data) by a company like Google, as opposed to a community of 1000 people storing each other's data, or even smaller, 100 or 10 people. Maybe it's a lot of data for each of those people, but that's fine because it's a scope we can make meaningful.
What we want is to have intentional communities, where we can think consciously about who we're stewarding for and communicating with.
Storage and user expectations
KEVIN: I just joined SSB at DWeb Camp, and my first question was, “How much data is this taking up on my hard drive? How much do I add on average when I follow someone?”
An SSB developer showed me how to check SSB's data usage on the terminal. I started at 200M. Now I follow about 30 people and it's up to 1.5GB. How did I get here?
If you're taking up that much data, and you're not showing people that, you're not thinking about all the use cases.
I've had to struggle with wrangling gigs on my phone, so invisible stewardship of others’ data scares me.
ROB: “How much space is other people's stuff taking up on my machine?”” is an important question in any of these approaches where the producers of records are also the custodians.
That made me think of torrent trackers. Back when I was more into torrenting, community wisdom was that you should shoot for a 1:1 balance of downloading and seeding. Thinking about that now, I started to realize, if you do want to steward data in a community context, your responsibility is to store a lot more of other people's stuff than your own.
BRENDAN: Decentralization requires us to shift the labor of stewardship. When we decentralize something, we're pushing out the burden of keeping that thing available and online. We're turning everyone who used to be a thin client into a steward.
We're upset by Cambridge Analytica aggregating personal information for uses we don't support. But many of the remedies will involve moving work onto people who weren't doing work before. If we look at Paul Frazee's conversation around thick versus thin clients and attendant levels of control from earlier reading discussions, we're now saying, you have new responsibilities.
It's hard to change the rules of the game. Everyone's been kind of freeloading on a revenue model based on treating the data people create as a kind of labor that can be monetized. The decentralized web is about reclaiming that labor. But people may not be willing to pay for things they're used to being “free” in order to regain control.
If you become a node in a decentralized system, you become a steward overnight. You might not even realize it. How do we have that conversation in a meaningful way, that allows somebody to see the value of taking the SSB road?
ROB: How do we provide structures to make it easy to steward others’ data, and the cultural signifiers to make that desirable? Where do local institutions fit in in offloading some of that burden? It's probably not feasible for everybody.
The way to solve that, theoretically, is through some sort of community institution that can bear some of the custodial load. For example, you could have a local community pinning service, something that we all maintain where all of our data is redundantly stored, so community data doesn't take up 100 times the space of my own stuff on my hard disc.
If this is important, we have to find ways to support the idea that the more capable or affluent you are, the more of other people's stuff you're storing. What kind of community structures can or should exist to offload burden?
BRENDAN: I think a lot about the Internet Archive (IA). No one thought it would be important to archive Trump's tweets pre-election. I think a lot about all about all of these things that aren't valued by society today but will be very valuable for somebody later.
That’s a lot of data that somebody has to hold. How do we hold it?
ROB: My grandmother died last year, and we had to split out her stuff: What needs continued custodianship in the family? Who in the family can provide that? Even when the person to whom it was originally meaningful is no longer around, there's still a sense of importance.
Justice aspects of stewardship
KELSEY: This sounds like something I've been thinking of as “the weight of heritage”– an obligation to steward an ever-growing inheritance of data, knowledge, wisdom, and physical artifacts.
I think there are two forms of of stewardship that we're talking about here. One of them is care, in the present, of a thing. That's the kind of stewardship you do for communities you care about: it's the way something reaches its community.
The other form of stewardship, which I think is more difficult, is the long-term care of a thing. This involves either aggregating greater amounts of knowledge as time progresses, or pruning. You will lose data whether you choose to or not– it's not just storage that matters, but also the ability to curate.
If we want to keep our heritage, how do we manage all of this weight? How do we make permanent choices on behalf of other people? How can we let people make those choices for themselves within the platforms and protocols that we design? Pulling from justice: who draws the line on what gets kept, why, and for whom?
ROB: A section of the post-custodial reading is focused on our responsibility to custodianship. It suggests that our conception of common good is broken, and unable to fulfill that in the current economic and cultural environment.
They turn to the idea of “collective good”, which is focused on: how do you choose what you are responsible for? What is the cultural system within which you are working and thinking when you make that choice? Who is or isn't actively saying, “this is important to me”? Who are you saving it for, and how do we make those decisions in an equitable and just way?
If you operate from your personal conception of the common good, in what way is that really good enough versus just powered by your context? If you're in a position to choose what our society preserves, you're probably also coming from an affluent, colonial, white context.
I want a discussion about how you determine what falls into the collective area.
Trust, deletion, and the scope of technology
MAUVE: An important part of that is getting rid of the choice: should I store this, or should I not? Right now, we have scarcity of storage. If storage isn't scarce, you don't have to decide whether to keep something. Instead, it's: which things do I want to purge?
Storage is expensive, but it's not that expensive. I think a lot of storage scarcity is artificial. We're getting phones that have 32GB of storage. By 2019 standards, that sucks. There are SD cards that can store Terabytes. There's a lot we can do to get rid of that problem.
KEVIN: Storage is cheap enough that we don't have to make the choice whether to keep everything, but I do want a choice. When you post something to SSB, it's uneditable, immutable.
I don't want everything to exist forever. I want a choice. Even if we could store everything, I think there's a right to be forgotten.
We have to ask ourselves, what do we want to steward and maintain for ourselves? I don't want my growing pains to be other people's entertainment. I don't want my mistakes to haunt me forever, because I grow.
What is it that we choose to keep? What is it that we choose to let go of?
MAUVE: Deletion is a very big question, and not enough people talk about it.
If you publish something and it replicates across the world, that's scary. Anything that has ever been on the internet may still exist. But what if you had an “unpublish” button, and only replicating to people you can trust to respect that button?
As the people building applications, we have to give users informed understanding. We have to say: this is available forever; or, this is the level of trust you're giving your audience.
KEVIN: Is it possible for messages to, say, self-destruct in 24 hours?
MAUVE: If you trust that everyone receiving the data is going to respect a protocol-level request to delete, then you're golden. But as soon as there's a malicious actor…
In the case of Snapchat and disappearing photos, someone could screenshot your private photo. Even if you have a screenshot notifier, they could use another camera to take a picture. You wouldn't know. The closest we can get is to make software that behaves well and hope that people don't circumvent it.
KELSEY: Trust changes over time, too. Think of marriage, followed by divorce. That's someone you planned to spend the rest of your life with, sharing everything. Then something messy happens. Now the person you trusted with your life and everything you own is now no longer worthy of that. Our system needs to account for that.
MAUVE: This is where centralization wins, actually. If every time you want to access something it has to go through a third party, access controls are way better. Somebody could save that data for later, but the default access is through an arbiter.
ROB: Facebook can be your arbiter for a given post. As long as you trust Facebook to be a fair arbiter, Facebook can lock the other person out of a post. Do you trust Facebook?
A bunch of this is about software design, but a lot is about trust. None of these systems can actually remove trust.
It's a concern, the degree to which in the DWeb and blockchain context, people talk about not needing trust. A discussion like this shows how much– even more– critical trust is in a decentralized system. There are more people involved, and we're often removing that arbiter position.
BRENDAN: It's important to not ask our technology to do things that aren't possible.
I'm currently speaking into a microphone. If I say something stupid, I can't unsay it. There's no back button; you've all already heard it in the experiential world.
We try to build protocols to automate away the worst aspects of society, but you can't ask the impossible.
Stewardship is a dynamic thing. Part of the awesome responsibility of stewardship is understanding authorial intent. A steward upholds the rules of the road, which have to be set out beforehand.
You don't care about a delete button, right up until the moment you really, really need one. You don't care if your information is encrypted, until you suddenly do. It's not possible for us, as protocol designers, to think of all edge cases and design for them.
I think it's incumbent upon us to really put the effort in to translate our technologies into useful metaphors. I think that effort belongs squarely on the people authoring the protocols.
MAUVE: People often ask me: “What if I publish copyrighted material on Dat?”
Well, you should get sued! That's on the legal level, not the protocol level.
A lot of tech people think exclusively about what is cryptographically possible, but miss the social and legal aspect. Absolutely, the tech should have good defaults that reduce onboarding pain. But in the end, it's societal pressures that push forward.
Focusing on just the tech is not enough. I think the tech is the least interesting part.
On a bigger scale, what are the cultural values? What are the morals?
IPFS has Filecoin, where stewardship is in the protocol. You just publish your files, and the free market provides. There's no human involvement. It's one approach. The alternative is to have people agree on what to do, and that's really important.
If a partner has confidential data and they're abusing that trust, they should expect social repercussions. The tech should try to protect us as much as possible, but bad actors will always exist.
ERIC: In Tending the Wild, the author makes the point that technical knowledge can be embodied in culture. “When do you start a fire to help plants regrow” is technical stewardship, but all of that affects values, beliefs, and behaviors.
KELSEY: We started by talking about stewardship in several contexts, often with physical aspects. I think that helped ground us as we moved through a digital-focused conversation around what we're keeping, why we're keeping it, and who we're keeping it for.
I'd like to close with quote from a favorite short story, which comes back to not just physical artifacts, but experiential, and thus the inevitable disappearance of data over time.
This is from “The Witness", by Jorge Luis Borges:
But something, or an infinite number of things, dies in every death, unless the universe is possessed of a memory, as the theosophists have supposed. In the course of time, there was a day that closed the last eyes to see Christ. The bottle of Junin, and the love of Helen each died with the death of some one man. What will die with me when I die, what pitiful or perishable form will the world lose? The voice of Macedonio Fernandez? The image of a roan horse on the vacant lot at Serrano and Charcas? A bar of sulfur in the drawer of a mahogany desk?
KEVIN: That's a great way to not end the conversation, but leave us thinking. I hope people think about how to continue this conversation: defining how to be good stewards, and deciding what should be preserved, what we want to preserve, or letting people choose. Thanks for an awesome conversation.
- Pastor Henry Wright, The Stewardship of Time, 2019: 2:50-6:17. Transcript
- Nora Marks Dauenhauer, Haa Tuwunáagu Yís, 1990: Elders Speak to the Future:
- Kat Anderson, Tending the Wild (2005):
- xv-xviii (Preface)
- 2-6 (Introduction)
- 358-364 (Coda - Indigenous Wisdom in the Modern World)
- Trevor Owens, Theory and Craft of Digital Preservation, 2017:
- 6-9 (Sixteen Guiding Digital Preservation Axioms)
- 122-130 (Conclusion: Tools for Looking Forward)
- Helena Karasti, Karen Baker, & Eija Halkola, Enriching the Notion of Data Curation in E-Science: Data Managing and Information Infrastructuring in the Long Term Ecological Research (LTER) Network, 2006 in Computer Supported Cooperative Work. 15. 321-358. 10.1007/s10606-006-9023-2:
- 6-11 (Challenges of Data Sharing)
- 14-16 (Intensive Data Description)
- 23-27 (Discussion)
- 30-33 (Conclusions)
- Society of American Archivists definition of post-custodial theory of archives
- Hannah Alpert-Abrams, David A Bliss, Itza Carbajal, Post-Custodial Archiving for the Collective Good, 2019:
- 5-12 (Part 1: Post-Custodial, Anti-Colonial, Neoliberal & Part 2: Labor)
- 18-21 (Part 4: From Common Good to a Collective Good)
- Nadia Eghbal Roads and Bridges: The Unseen Labor Behind Our Digital Infrastructure, 2016:
- 8-10 (Executive Summary)* 40-45 (Digital Infrastructure Changes Frequently)
- 53-58 (Why do people keep contributing when they’re not getting paid?)
- 60-65, starting with “Structurally…” (re decentralization, money, and project stewardship)
- Quote on page 75
- 125-130 (How to sustain)
- Andrew Russel & Lee Vinsel, Hail the maintainers, 2016
- LTER (1990): Long-Term Ecological Research and the Invisible Present, and Long-Term Ecological Research and the Invisible Place