From crowdsourcing to knowledge communities: Creating meaningful scholarship through digital collaboration

Jon Voss, Historypin, USA, Gabriel Wolfenstein, Stanford University, USA, Kerri Young, Historypin, USA


Over the last three years, Stanford University and Historypin have teamed up to pilot three projects aimed at testing how humanities researchers might use different types of crowdsourcing or community sourcing to further research, and to ask questions that would be difficult or impossible to answer in more traditional modes of inquiry. We found that not only will there be a much more collaborative future among academic humanists, memory institutions, and knowledge communities, but also that only through such collaborations can some of these questions can be explored. In this paper, we share the details of the various projects and their outcomes, as well as a host of tools we found helpful in the process of identification, outreach, and collaboration with knowledge communities.

Keywords: crowdsourcing, community engagement, knowledge communities, outreach, partnerships

1. Introduction

Over the last three years, Stanford University’s Center for Spatial and Textual Analysis (CESTA) has led a research project in partnership with Historypin to examine the potential for leveraging crowdsourced information about photographic, map, and textual content for crowdsourced humanities research. The project was funded by the Andrew W. Mellon Foundation and sought to study elements of user interface, user interaction, community engagement, and collaborative partnerships. While the project focused largely on humanities research within the context of higher education, the findings are widely applicable to museums, many of whom participated in the study.

We found that there is an important distinction within the realm of crowdsourcing for more complex and collaborative tasks and processes that revolve around knowledge communities: small networks of neighbors or enthusiasts representing a group of people that could be systematically organized to share and participate in research for a common aim. Engaging with these communities often requires longer time frames than simpler task-driven crowdsourcing may allow, and is necessarily much more collaborative than extractive. Furthermore, there are important social benefits for the institutions and communities alike that fulfill the new missions of twenty-first-century cultural heritage and academic institutions.

Throughout our research, we found that not only will there be a much more collaborative future among academic humanists, memory institutions, and knowledge communities, but that only through such collaborations can some questions relative to the humanities be explored.

In this paper, we summarize the methodology of the project and present key learnings and examples that we hope will assist institutions that are interested in engaging with knowledge communities and provide a way to practically approach such projects with realistic expectations and ideas for successful implementation.

2. Methodology

Our focus in this study was the practical application of crowdsourcing techniques to the work of three teams of academic researchers, all with different objectives and different subject matter. Within this study, the teams piloted three different projects aimed at testing how humanities researchers might use different types of crowdsourcing or community sourcing to further research, and ask questions that would be difficult or impossible to answer in more traditional modes of inquiry. Each of the three projects sought to engage the public, or some sector of the public, in a different way and through different means.

In a sense, we ran a controlled experiment involving academic researchers, non-academic Web media bodies, and a large number of cultural heritage institutions operating on three variables, each defining a particular aspect of crowdsourcing and exploring how it might contribute to academic research and, concomitantly, to community engagement. Two of the three projects, Year of the Bay and Living with the Railroads, focused on community engagement strategies with Historypin that included collaboration with museums and other cultural heritage partners that are particularly relevant to the museum community—the former with a broad public in specific localities, and the latter with a niche community of topical interest. A third project, 500 Novels, focused on crowdsourcing through the use of Amazon’s Mechanical Turk and the subsequent geographical display of the results on Historypin. The outcome and learnings of that specific project added significantly to our understanding of crowdsourcing tools and approaches, but because it focused less on community engagement and did not partner with other institutions, it is not something we will attempt to cover in this paper.

2.1 Year of the Bay

The Year of the Bay experiment ( was designed to test the experience of conducting crowdsourcing for the humanities in conjunction with major public events that could garner widespread attention and enlarge the audience for a crowdsourcing project. Academic research was led by Jon Christensen. The metaphor of a funnel is often used to describe participant engagement with crowdsourcing, because many more people interact with most projects superficially (at the wide end of the funnel) than the number of people who travel through the funnel and engage deeply with projects. In this case, our aim has been to widen the funnel as broadly as possible in order to increase the number of participants, engage as diverse an audience as possible, and increase engagement proportionally.

Our hook, the Year of the Bay, was 2013, a year that brought the America’s Cup races to the San Francisco Bay, along with completion of a new span of the historic San Francisco-Oakland Bay Bridge (one of the largest public works projects in America at this time), the 150th anniversary of the Port of San Francisco, the opening of a new home for the Exploratorium on the Bay in San Francisco, and exhibitions at the California Historical Society and the Oakland Museum of California focused on the bay. Our strategy was based on tapping into enthusiasm for the bay and for local history and archival materials evidenced by decades-long campaigns to save the bay, and by several earlier and ongoing history projects and blogs in the area. Our goal was to collaborate with media organizations and museums, libraries, and archives to bring people to Historypin—through a custom URL at—to engage with archival materials from nearly a dozen cultural heritage partners by pinning them to locations and providing other metadata, as well as contributing new materials from organizational, individual, and family archives. Our hope was twofold: 1) that participants would help generate useful, accurate, and meaningful metadata for archival sources lacking metadata, and 2) that new archival materials would diversify and enrich our understanding of the environmental history of the San Francisco Bay and different cultural understandings and practices related to the bay. The history of the bay has long been dominated by a standard environmental narrative that we hoped new sources would enrich and complicate, adding to new understandings of environmental history.

Throughout the course of the project, we designed a new prototypical metadata crowdsourcing tool we called “History Mysteries,” which allowed users to suggest tags, dates, titles, geographic locations, and pitch/yaw/zoom camera angles (through a Google Street View interface). We also worked with Stamen design to utilize their prototypical georeferencing tool that allowed the public to “warp” maps to the globe and put them in place (

2.2 Living with the Railroads

Following the work of Richard White, Living with the Railroads ( sought to use crowdsourcing to learn more about the social, cultural, and environmental impact of the development and expansion of the railroads in the American West, and hopefully eventually across the United States (White, 2011). The crowd in trains is neither the more amorphous crowd of 500 Novels nor the crowd of those generally interested in the San Francisco Bay Area of Year of the Bay; rather, the community we sought to engage in Living with the Railroads is that of railroad enthusiasts. This is a niche group of clusters of enthusiasts with typically very specialized interests, from model railroaders who study particular stretches of current or historical tracks, to trainspotters who document different types of trains, to photographers and photo historians. This project, therefore, had to explore and lay the groundwork for interacting with this specialized community in a different manner than either of the other two projects. Instead of interacting with a crowd where expertise lies on the side of the researchers (CESTA) or the content aggregators and community builder (Historypin), in this instance it is the community members who have the expertise. This draws a different set of challenges, including how to approach the community, what will engage them, and how to get them to stay connected (Dunn & Hedges, 2012a).

The goal, in conjunction with Historypin, was to get railroad enthusiasts to upload their data, identify images and documents whose provenance is unknown, and help facilitate the growth of connections between enthusiast and enthusiast communities. That is, this project is not just seeking data from people, but hoped to get people to see the Historypin site, and partnership with Stanford, as something to their benefit, with the side-effect for CESTA and Historypin being data collection. Core to this effort was building a working group of principal investigators, research assistants, and train enthusiasts to figure out the best ways to solicit broader enthusiast participation, as well as work with Historypin to develop the website.

3. Crowdsourcing and knowledge communities

Before we get into the practical lessons we’ve taken from the aforementioned projects, it’s useful to contextualize an important distinction in the kind of crowdsourcing that has evolved from these projects.

We’ve seen that digital-transcription projects such as Transcribe Bentham, both in the humanities or “citizen science,” tend to blend a combination of “crowd” and “community,” a distinction described by Caroline Haythornthwaite (Clauser & Wallace, 2012). In the former, loosely associated individuals perform autonomous and relatively simple tasks, while the latter often requires collaboration and a more social and coordinated element (Haythornthwaite, 2009). In Railroads and Year of the Bay, we were not dealing with questions of transcription or necessarily easily defined tasks, but rather more complex questions of contextualization and even contribution with a specific goal in mind, still meeting definitions of crowdsourcing but leaning toward more complex tasks requiring an engaged community (Dunn & Hedges, 2012b). In both projects, there was a collaborative approach between the participating institutions and the individual participants toward a common aim, however broadly defined—be it exploring the human and ecological history of the bay or tracing Western expansion through life around the railroads. This also speaks to the potential for strengthening and measuring community ties through this work, which Nick Poole and Nick Stanhope describe as a new approach not only to digitization, “but of a new role for museums as places of engagement and participation in which the disciplines or curatorship, digitisation, collections management and documentation are shared with the user in a joint effort to develop shared cultural capital” (Poole & Stanhope, 2012).

As these projects progressed, it was helpful for us to understand and define the various communities we were increasingly in partnership with not as users or as a crowd, but as “knowledge communities.” Whether they were local residents or history enthusiasts who were knowledgeable about their area, or in some cases obsessed with solving a history mystery that was presented to them, or had specific topical knowledge about details pertaining to one railroad line, these small networks of neighbors or enthusiasts represented a group of people that could be systematically organized to share and participate in research for a common aim. After two years of working with specific individuals and groups of individuals, we came to see this as an important distinction within the realm of crowdsourcing, and an approach that is most importantly collaborative in intent and design rather than extractive and focused on the needs of an institution or a researcher. This approach has also been termed community-sourcing (Sample Ward, 2011).

Literature on crowdsourcing generally supports the process (which remains frustratingly variably defined) on the basis of whether some sort of utilitarian goal is met: entries are added (Wikipedia), materials are transcribed (Transcribe Bentham), data is classified (Galaxy Zoo). In this, understandings of crowdsourcing, both definitional and in terms of results, remain wedded to a market model. Can you get data or analysis? Can you get it more quickly or more cheaply? These are the primary questions. The question that motivated our project—whether crowdsourcing can be useful to humanities researchers pursuing humanities projects—found a very different answer. This isn’t to say that we did not also pursue utilitarian questions. Our three subprojects engaged three different types of crowds: the amorphous public of people interested in the Bay Area; the expert community of train enthusiasts; and the paid community of Amazon Mechanical Turkers. All three were, unsurprisingly, only partially successful in terms of the research questions initially posed. There is usable and interesting data (images and metadata uploaded to Historypin for the first two, analysis of literary passages as to emotionality in the later). But it isn’t perfect, and in some cases doesn’t quite meet the goals of the various PIs. But the conclusions we have drawn thus far ask us—and, by extension, other researchers—to look beyond the data to the process itself.

There is increasing segmentation of crowdsourcing approaches in cultural heritage (Ridge, 2014), with great import for the academy as well as memory institutions. Crowdsourcing offers academic humanists a different way of engaging the public, especially in collaboration with non-academic organizations. We suggest that, if researchers have flexibility in their projects, building or collaborating with knowledge communities can result in original research, and an engagement with the community that is typically missing from university-based research, which is an important conclusion. Without this flexibility, crowdsourcing possibilities become more limited, though still possible, as our work with the Turkers demonstrated.

Furthermore, we feel that museums, libraries, archives, and academic institutions alike increasingly embrace the opportunities they have to positively impact communities not just in terms of cultural programming, but also as partners in strengthening a range of societal measures (Van Thoen, 2014). Meaningful engagement of knowledge communities can have a range of outcomes in addition to the traditional measurements of crowdsourcing for both the institutions and communities, such as increasing interest in and sense of ownership of institutions, strengthening community ties and associational life, and decreasing isolation among frequently marginalized sectors of society such as seniors (Thomson & Chatterjee, 2013).

4. Practical applications

In the following section, we look at specific lessons we learned from the projects, broken into four categories, which we hope will prove helpful in embarking on shared projects with knowledge communities or the public. We don’t think all of these things necessarily require large budgets and, in the case of museums, they can often be leveraged by existing staff positions in education and outreach, or in some cases in coordination with digital or digitization teams. Most importantly, we see these as part of a holistic and long-term approach to community engagement—one that has the potential to both strengthen communities and increase the standings and holdings of museums.

4.1 Design and expectations

Engaging with knowledge communities will require changes and present challenges to our existing funding models or approaches to crowdsourcing projects.

4.1.1 Recognize the time needed to set the stage for the project

It seems you can not leave too much time for relationship development in the early phases of a crowdsourcing project. When it comes to simple crowdsourcing tasks, you can save time by going straight to Mechanical Turk or another paid service, but when it comes to working with knowledge communities, it takes time to identify your target audiences and gain their trust.

Any major project will have unavoidable hurdles to working with communities as part of a large institutional partnership involving multiple organizations. With Year of the Bay, once we began working with the community groups and cultural heritage partners, we had a headstart with the networks that Jon Christensen (research lead) brought to the table. Once we had the site up and were underway on our communications and engagement strategy, it still took months to start activating some of the partners that had signed on, facing challenges ranging from copyright and licensing on archival photographs to long lead times at museums.

The same was true with Railroads, though it took even longer to gain traction with the various train enthusiast communities. Here it was critical to have direct contact and eventually champions from within the community in order to build trust. In each case, only after nine to twelve months did we have enough traction within the community that we could regularly draw attention to specific mysteries or solicit significant amounts of content.

4.1.2 Co-designing a project with the community

This requirement can be overlooked or glossed over when identifying academic research possibilities, or in museum digitization or enrichment activities that seem ripe for crowdsourcing. However, when considering a project in collaboration with a knowledge community, it’s first necessary to identify what that community or any other stakeholders see as value to the project. Because we are looking at a longer timespan for these types of projects, the research questions may change or evolve as the project progresses, but as long as the project meets some expressed need from the community, you will have much richer content and community expertise to work with.

This can most clearly be demonstrated with our experience with the Railroad community. In early discussions following a presentation at the Southern Pacific Historical and Technical Society meeting, it became clear that many enthusiasts active in that community were frustrated that the Southern Pacific archives that had been donated to Stanford some years earlier but had not yet been digitized or made accessible to the community in any way. Input from this particular railroad community about what is digitized has helped build trust and alleviate early suspicions that they were being asked to help Stanford without getting anything in return. It also has provided great insight into the holdings themselves.

4.1.3 Measure return on investment on a longer timescale

Similar to the time needed in the start-up phase, collaborations with knowledge communities on the whole should be built with a much longer timeline in mind, ideally five to ten years rather than one to two years. It will take time for the project to develop in order to yield results that researchers can use. Building trust, recognition, and a culture of sharing in the community is a must, and once that is established, it will yield results. There is also the recognition that working in a collaborative fashion will involve different partners with different resources, so it may take one organization longer to digitize, but researchers may then gain access to content that may have not been accessible at all.

A great example with Railroads is that well into the first year of the project, one of the cultural heritage partners, the Western Railway Museum Archives in Suisun City, California, identified and began to digitize and map a collection of hundreds of survey photos they had along the San Francisco corridor of the Southern Pacific line. Most of these photos are early twentieth-century photos that had never been seen by the public. Even after the funding for the project has ended, the Western Railway Museum Archives continues to create new tours and add content to the project, drawing in more enthusiasts to engage in this particular collection and the Railroads project itself.

4.2 Methods of engagement

In looking at long-term engagement with knowledge communities, several key strategies have evolved not only in our pilot projects, but in many projects that Historypin has worked on with numerous partners.

4.2.1 Seed the project

We’ve found great success in both projects by working with GLAM-sector partners early on to seed the projects with rich content. This serves several purposes. First, it draws people into the project with interesting content and targets specific communities. Second, it illustrates the kind of content that we’re looking for. Finally, by putting up content we don’t know much about, we open the door for enrichment activities like story gathering .

With Year of the Bay, we had a great deal of content from institutional partners including the Oakland Museum, San Francisco Public Library, Chinese American Historical Society Museum, National Archives, and more. Because of Historypin’s extensive network of partners (at time of writing, it has more than 2,100 institutional accounts), we were able to pull in a good deal of content already on Historypin. Working with aggregators such as state and local history organizations or the Digital Public Library of America can potentially help seed projects as well.

4.2.2 Strategic partnerships

Both Railroads and Year of the Bay depended greatly on a network of strategic partnerships, which were put in place early on and evolved throughout the projects. The key to successful strategic partnerships is matching the needs of the communities with the aims of the project. This clearly ties to the idea of co-design, but identifying the partnerships actually has to come first.

In some early failures in Year of the Bay, we found that rather than trying to drum up interest in communities, it was far more effective and efficient to first identify need and interest from existing projects or partners and build from there. After a slow first six to nine months, we started to make quicker strides once we could identify and reach groups already involved in some level of community archiving. Partnerships with media outlets like the San Francisco Chronicle and the Bold Italic blog were also avenues that allowed us to reach a broader audience at the first layer of our engagement funnel. While media attention always led to spikes in our Web traffic, it did not usually lead to conversion events like new members, history mystery suggestions, or pinning. Those happened with more intimate partnerships that required a bit more individual investment, like long-term work with neighborhood history groups such as the Bernal History Project. This has led us to focus earlier on in projects not so much on a large general public but on specific strategic partners.

4.2.3 Taking an iterative approach

Most projects of this kind of scale end up taking an iterative approach to development whether they intend to or not. We were able to plan this in, though it took us some time to be clear exactly what we would iterate and put limits on how far we would take it in order to accomplish all of our goals.

Working over such a long time frame, it was important for us to identify modules and stages in all of the projects, and we came to value small investments in early stages, such as mockups and concept development, paper testing, and user-interaction research in very early iterations. Two of our industry advisory panel members, Cyd Harrell (user interaction research) and George Oates (user interface design), were instrumental in helping us build this into our processes throughout the project. Ideally, being explicit about planning these in phases and limiting the number of iterations or setting priority on specific outcomes would be laid out early on.

4.3 Skills, training, and technology

Crowdsourcing with knowledge communities requires many of the same skills and technology, but there are some key differences worth considering.

4.3.1 Building the team

We learned that designing long-term crowdsourcing projects and extensive engagement with knowledge communities requires a high level of teamwork across disciplines, departments, and organizations. We had to make adjustments throughout the project to identify team members with the appropriate skills or build in time and training to get up to speed on certain elements. Because outreach is such a critical part of these projects, we found that community organizing experience and training is a necessity, and the young history scholars that excelled in the project tended to lean more toward public history.

Since both of our projects were heavily intergenerational in nature, which is an important measurement of the social aims of an organization like Historypin, we found that within the academic setting this was often internalized, with faculty and staff needing to learn from students or at least depend on them as equal partners when it came to things like social-media strategy.

Community experts are key partners in the team. While they may not have the ability to engage with as much of their time, it is essential to find times to include them and build meetings around them, as indicated in other areas highlighted above (co-design and partnerships).

4.3.2 Technology

While crowdsourcing projects are typically enabled by digital tools, a key difference of engaging knowledge communities is that while the outputs may be digital, the interactions often require face-to-face interactions. In the 500 Novels project, which was almost entirely focused on crowdsourcing for capacity, we relied on the the Mechanical Turk service. But for Year of the Bay and Railroads, we utilized a wide variety of digital tools, often complimenting or augmenting these tools with in-person meetings. Some of the tools that we developed even had that in mind—the history mysteries module could be effectively used in a group setting.

One model we experimented with on Year of the Bay was a “History Hackathon,” which took the idea of the history mysteries tool but was an in-person meeting designed to provide a diverse group of technologists, enthusiasts, historians, and the general public to help curators identify specific mysteries (Young, 2013).

4.4 Sustainability and outputs

This final section takes into account some long-term considerations around sustainability and lasting outputs.

4.4.1 Sustainability and preservation

Having relationships with community institutions, museums, and other cultural organizations played an important role in preservation plans. Another key consideration is what role these partners can play in the long-term sustainability of a project, and what resources can be utilized, perhaps as part of existing outreach budget or another cost center. If there is not long-term support, what are possible exit strategies?

4.4.2 Outputs and rewards

The outputs from crowdsourcing with knowledge communities run a wide range aside some of the more immediate and measurable metrics of annotations. It’s important to reward these contributions whenever possible. This proved difficult for us on each of the projects, though we did find ways to do so. It could have been as simple as adding credit to comments or social-media mentions, or something more in depth like organizing special meetings for Railroad contributors and special access and tours of the Stanford Archives. It’s been important for us to publish findings from several different angles and, where possible, involve community partners in sharing their insights from the project at local conferences or events, particularly when we, or they, can highlight their materials.

5. Recommendations for future research

There are two key areas for future research that we see as important to move forward this specific area of crowdsourcing.

One is the further development of methods and means to effectively assess, cost, plan, staff, and evaluate these types of knowledge-community projects. Approaching this assessment task requires a framework that incorporates social impact and community-building metrics alongside the standard numbers built around footfall, patrons reached, and items used.

A second area of future research is the development of the technical means to share the more nuanced or complex annotations that are often the output of knowledge communities. There is further complexity around data sharing with multiple institutions and cultural heritage aggregators like the Digital Public Library of America, Europeana, and others. The work of the Open Annotation Collaboration on a robust data model is of particular interest (


We have so many people to thank for their roles in this project over the last year that it’d be impossible to name them all. The Crowdsourcing for Humanities Research project was funded by a grant from the Andrew W. Mellon Foundation and led by Zephyr Frank at Stanford University, who served as the principal investigator. You can see all of the people that worked on the project on our behind-the-scenes blog, We’d also like to thank the growing community of practitioners and scholars sharing their experience in this rapidly evolving field, not all of whom are cited below.


Clauser, T., & V. Wallace. (2012). “Building A Volunteer Community: Results and Findings from Transcribe Bentham.” Digital Humanities Quarterly 2012.6.2. Available

Dunn, S., & M. Hedges. A. (2012a). Crowd-Sourcing in the Humanities. Connected Communities. Available

Dunn, S., & M. Hedges. B. (2012b). Crowd-Sourcing Scoping Study: Engaging the Crowd with Humanities Research. AHRC report. Available

Haythornthwaite, C. (2009). “Crowds and Communities: Light and Heavyweight Models of Peer Production.” Proceedings of the 42nd Hawaiian Conference on System Sciences. Waikola, Hawaii, IEEE Computer Society: 1–10.

Poole, N., & N. Stanhope. (2012). “The Participatory Museum.” Collections Trust. Last modified July 3, 2014. Available

Ridge, M. (2014). Crowdsourcing our Cultural Heritage: Introduction. In Crowdsourcing Our Cultural Heritage, Mia Ridge, ed. Ashgate. 1-13.

Sample Ward, A. (2011). Crowdsourcing vs Community-sourcing: What’s the difference and the opportunity? Available

Thomson, L., & H. Chatterjee. (2013). UCL Museum Wellbeing Measures Toolkit. AHRC and University College London. Available See also, UCL Museums and Collections, “Touch and Wellbeing”:

Van Thoen, L. (2014). “Museums make you happier and less lonely, studies find.” Freelancers Broadcasting Network. Consulted January 30, 2015. Available

White, R. (2011). Railroaded: The Transcontinentals and the Making of Modern America. New York: W.W. Norton & Company.

Young, K. (2013). “Our Year of the Bay Hackathon at the California Historical Society.” Historypin. Consulted January 30, 2015. Available

Cite as:
. "From crowdsourcing to knowledge communities: Creating meaningful scholarship through digital collaboration." MW2015: Museums and the Web 2015. Published February 1, 2015. Consulted .