Art + Data: Building the SFMOMA Collection API

Sarah Bailey Hogarty, New Museum, USA, Keir Winesmith, San Francisco Museum of Modern Art, USA, Matthew Hrudka, SFMOMA, USA, Beth Schechter, Stamen Design, USA


During its current closure, the San Francisco Museum of Modern Art (SFMOMA) has the unique opportunity to engage in a prolonged period of self-examination and experimentation. In this sprit, SFMOMA is developing an API for its permanent collection. Before SFMOMA releases its Collection API to the public, the museum will be the primary user, with the intent to develop a deep understanding of the API’s possibilities and applications for both internal and public use. The questions we ask about the collection provide the guiding principles for development and design of the API. In collaboration with Stamen Design, SFMOMA organized an Art + Data Day to bring together some of the Bay Area’s most innovative creative technologists to ask probing questions about the museum’s collection using the new API. By the end of the day, we hoped to produce some amazing outcomes, and we were open to what they might be—an application? A data visualization? A story? A roadmap to even more robust data for the API? Or perhaps designs for something grander. This event was structured as an un-hackathon, a format inspired by participatory design approaches to creatively develop technology and a rejection of high-pressure, end-product-focused hackathon practices that reward speedy coding over effective problem solving. Rather than racing to complete or compete, the un-hackathon was about collaboration, brainstorming, and better understanding how the API will be used. Aside from an unabashed excuse to geek out and create beautiful data visualizations and nerdy software, Art + Data Day was a meaningful opportunity for SFMOMA to better understand how its API will be used and how it can be improved. Written in collaboration with Beth Schechter of Stamen Design, this paper deep dives into the Art + Data Day findings, emphasizing how they inform digital storytelling, potential gaming opportunities, and innovative on-site digital experiences at SFMOMA, and presents the SFMOMA API.

Keywords: API, design, data, code, collection, technology

1. Introduction

The San Francisco Museum of Modern Art (SFMOMA) is currently closed as it undergoes a massive building expansion. This undertaking encompasses much more than a change to the architectural footprint, as the museum itself will be comprehensively transformed when it reopens in 2016. During its closure, SFMOMA has the opportunity to experiment with many aspects of its practice, including investigations into new modes of digital storytelling, user-experience design for museum collections, and the visualization of collection data. In 2014, two initiatives exemplifying this spirit of experimentation were launched: the museum began building a Collection API designed to open up access to SFMOMA’s multivalent collection data amassed over the past seventy-five years; and the Web + Digital department convened the SFMOMA Lab (, an interdepartmental research group that investigates topics at the intersection of art, design, and technology. When thinking about creative ways to test the new Collection API, and in keeping with SFMOMA’s dedication to cultivating artistic innovation in dialogue with artists and creative technologists, SFMOMA Lab members contacted Stamen Design, a San Francisco–based technology design studio. Together, SFMOMA and Stamen created a participatory design event that paired developers and code artists with museum staff to interrogate the API in an event dubbed Art + Data Day.


In recent years, several cultural institutions have begun developing public APIs to access and explore their online collections. Within the last year alone, new or relaunched APIs have been produced by the Walters Art Museum in Baltimore (, the Cooper Hewitt Smithsonian Design Museum in New York (, and the Rijksmuseum in Amsterdam ( These new APIs coming out of the museum sector emerge from a growing global interest in releasing cultural data for free public use. Included in SFMOMA’s mission is the mandate to make the art for our time a vital and meaningful part of public life. The API is designed to grant access to the massive amount of information associated with the institution’s artworks, the people involved in their collection, current and archival programming and exhibitions, and the museum’s history. Opening up this data and making it available to the broadest possible audience serves as another point of engagement with the art and artists contained in the museum’s collection.

Aside from an unabashed excuse to geek out and create beautiful data visualizations and nerdy software, Art + Data Day provided a meaningful opportunity for SFMOMA to better understand its API—how it might be employed and how it might be improved in future development. This paper presents the findings from Art + Data Day, their implications for SFMOMA’s explorations of digital storytelling both online and on site, and how the API can be used to implement these new experiences. Lastly, the SFMOMA Collection API will be discussed, including a detailed analysis of how Art + Data Day influenced its ongoing development.

2. Participatory design

Formatted as an “un-hackathon,” Art + Data Day focused on collaboration and problem solving as a way of testing an alpha version of the Collection API. Whereas hackathons generally reward speedy coding over effective problem solving, participatory design events like Art + Data Day create noncompetitive environments that foster idea sharing and learning. These one-day events can produce working prototypes and often result in new working relationships. For this event SFMOMA Lab brought together twenty-three designers, technologists, content specialists, and museum professionals. The composition of this group intentionally comprised the ideal variety of people who might one day use the Collection API, and whose needs ranged from pragmatic and administrative to complex and experimental. The group formed four teams, each of which developed a project that explored different facets of SFMOMA’s collection using the API. For SFMOMA, the primary goal of Art + Data Day was not to implement working prototypes or produce finished code, but rather to engage participants in open dialogue about the API’s potential and the innovation it could facilitate.


At its most basic, the API provides access to images and information about artworks and artists in the SFMOMA collection. Matthew Hrudka, SFMOMA’s design technologist and the developer behind the API, made the decision to hold back additional features from the alpha version of the API. This choice was made intentionally at the onset of the project in an attempt to balance intuitive simplicity with the progressive addition of features as they were determined to truly enhance engagement with the collection data. Art + Data Day was a collaborative exploration designed to uncover and define some of those essential features.

3. Project definition

Art + Data Day served as a user-testing session of those intentions and possibilities, offering both a collaborative framework for investigating common themes and interests, and also free-form time for experimentation. Participants were encouraged to think beyond the Collection API’s alpha framework and ask questions like: “What does photography from the seventies look like?” or “Can an artwork take a selfie?” or “Do external factors, like annual rainfall, influence the creation of an artwork?” or “Can an artwork be in a bad mood?”

The day began with a vision exercise, wherein participants wrote down project ideas, questions, and technical interests on sticky notes and posted them onto a whiteboard. This idea cluster was then reviewed by the group to find common topics, shared interests, and cross-cutting themes. Dozens of ideas were discussed during this process, but a small number of recurring topics quickly emerged, including color dominance, the context in which an artwork was created, artists’ lives, and visualization of anomalies in the collection data. This snapshot of the group’s thinking provided the foundation for a general understanding of the day’s explorations.


Throughout the day, the teams checked in with the overall group and periodically consulted with domain experts from SFMOMA’s Digital, Interpretive, Education, Design, IT, and Collections departments. Describing their projects to audiences or mentors helped the teams formulate their ideas in narrative form. At the end of the day, each team made a final presentation of their project to the group, allowing time for additional questions and feedback.

4. Building teams

As participants reviewed and discussed the focus areas revealed by the vision exercise, they began to articulate concrete ways to examine them, and like-minded groups formed around specific projects. Team Pixelmasher, led by new media artist Micah Elizabeth Scott, created new visualizations of artworks in aggregate by examining the pixel data of the images in SFMOMA’s collection. Team Context, led by University of San Francisco assistant professor of design Scott Murray, investigated pairing the SFMOMA Collection API with external data sets to deliver related content and provide greater context around artworks and artists. Team Selfie, led by Flickr engineer Bertrand Fan, explored how visitor-generated photographs of artworks in the SFMOMA collection can show the relationships people have to them. And finally, John Higgins, SFMOMA’s lead software architect, who works remotely, submitted a project that analyzed the emotion conveyed by artwork titles in order to explore whether this sentiment could be measured over time, across all of an artist’s works in the collection.

Team Pixelmasher

Scott was joined by Nadav Hochman, a doctoral student using computational methods for visually analyzing large-scale image data; Ian Smith-Heisters, a design technologist at Stamen; Keir Winesmith, head of web and digital platforms at SFMOMA; and Bridget Carberry, web production coordinator at SFMOMA. First, the team identified thematic buckets of artworks that would serve as the project’s primary data set. Initially, they chose a specific year, but quickly discovered that since only a percentage of artworks in the collection are digitized, restricting their search to a single year would not yield enough images. In order to return a more useful result, Team Pixelmasher turned its attention to an entire decade, the 1970s. This approach proved much more effective, as approximately 70 percent of the photography collection from that era has associated digital imagery available via the Collection API. Once the source material had been established, Scott used a script she had previously written that downloads, processes, and overlays large numbers of images to create a kind of virtual long exposure. A visualization of the cumulative pixel data contained in photographs created in the 1970s revealed new ways to look at a particularly rich area of SFMOMA’s collection. Whereas the team had expected to see a representation of the technical transition away from black and white photography, the visualization instead illustrated patterns that revealed variations in contrast, density, and aspect ratio.


Continuing with the team’s approach to visualize data from the API images, Hochman created a new view in aggregate and contrast. Utilizing code he had previously written to analyze large quantities of images, Hochman extracted a five-pixel slice from the center of each image in the set. He then arranged these narrow glimpses of each image horizontally. The result had a striking resemblance to the spines of a record collection. The varying height of the samples represents the varying file size of images as they were available through the API. Regenerating the visualization using high-resolution images normalized for actual artwork size would undoubtedly result in a new presentation of the same data set.


Next, Smith-Heisters wrote a script to display an endlessly looping animation of randomly chosen photographs from the 1970s in the SFMOMA collection. The animation ran at sixty frames per second, but Smith-Heisters built a pause function into the program, which allowed a viewer to attempt to freeze the sequence on a specific image. This feature created an impossible game of close-looking that invited users to test their reflexes to try to stop the animation if an image caught their eye.

The final project that came out of Team Pixelmasher was Search the Seventies, a basic search engine for artworks created in the 1970s. After experiencing the abstracted and animated ways of looking at SFMOMA’s artwork from the seventies simultaneously, Carberry wanted to engineer a more leisurely way to browse the selected data set. Working from a tool written by Winesmith earlier in the day to search artwork titles, Carberry modified the code to search and display only artworks created in the 1970s. Whereas Winesmith’s code returned a list of artworks with the query string in their title, listed in order of relevance, Carberry’s application took a query and returned only the images of artworks with that query in the title. This mode of searching a specific time period for a keyword and returning a page full of images provided a simple but satisfying way to look at images at the user’s own pace.


Team Context

While Team Pixelmasher spent the day manipulating the pixel data associated with images in the Collection API, Team Context looked outside of the SFMOMA collection for opportunities to pair external data sets with the API and deliver related content to provide deeper meaning about artworks and artists.

Team Context explored ways to connect individual artworks with the context of the world in which they were created. Murray was joined by Dan Rademacher, a project manager for Stamen Design; Mathieu Stemmelen, a graphic designer at SFMOMA; Stella Lochman, a program associate for public dialogue at SFMOMA; and Victor Powell, a data visualization artist. The team began their investigations by creating a mind map that situated an artwork or an artist as the central origin point from which the group moved out to discover infinite avenues of investigation. The mind map covered everything from the artist’s personal relationships and artistic influences to political, economic, and weather data to that year’s World Series winner. Taking a step back, the team members then asked themselves, “How do we start to get our head around such a vast array of possible avenues of pursuit beginning from any one of the 70,000 artworks available via SFMOMA’s API?” In response, the group decided to focus on a single artwork by San Francisco Bay Area artist Robert Arneson. It quickly became clear that Arneson’s self-portrait sculpture, California Artist, would provide a rich contextual foundation from which to extrapolate.

The visual prototype Team Context presented consisted of three vertical columns: the left column illustrated cultural and regional factoids from the year when the artwork was created, such as who was the president and Pantone’s color of the year; the central column illustrated the artwork’s “tombstone information” (title, dimensions, media, etc.); the right column noted biographical context for the artist, including his contemporaries and where he went to art school. Beneath these three columns appeared a timeline of Arneson’s life that visualized the number of his works in SFMOMA’s collection over the span of his life, with different circle sizes representing the number of works created by Arneson in a particular year. This timeline makes it immediately apparent that the majority of Arneson’s work contained in SFMOMA’s collection was created when he was in his late thirties.


Initially, the team was concerned that the context buttressing the life of a lesser-known artist such as Arneson would not be robust enough to create meaningful data visualizations. What the group found, however, was that although there was an abundance of rich context, the real challenge was converting it into meaningful data. Associating the most recent census records from the year that California Artist was created (1982) is relatively straightforward, but what does it mean? How does it change or evolve the way art historians or visitors understand the artwork itself? The timeline that Team Context created placed the artwork within the context of Arneson’s lifespan using the Collection API in a simple yet compelling way. The team learned, however, that adding more layers of context beyond that would remain a human-driven endeavor, rather than a data-driven one, and that there’s currently no danger that automation will replace the careful work that curators and educators do to communicate the relevant context around an artwork.

Team Selfie

Whereas Team Context looked to external sources to garner content about an artwork, Team Selfie looked at the interaction between an artwork and its viewer to ascertain how and if user-generated photographs of an artwork inform its meaning.

Team Selfie examined the many ways that visitors capture artworks on camera and investigated how these interactions might become part of the rhetoric of engagement around an artwork. In addition to Fan, the team included Tim Svenonius, senior content strategist at SFMOMA; Bosco Hernández, art director for the SFMOMA Design Studio; Eric Gelinas, design technologist at Stamen Design; and Anna Carey, an intern in SFMOMA’s web and digital department. Team Selfie posed a series of questions probing how the museum-going community views and shares its experience of art. The team began with the notion that photos of SFMOMA’s collection found around the Internet on Google Images, Flickr, Instagram, or Foursquare (among others) constituted a compelling digital manifestation of visitor interactions. If this notion were true, the group wondered, what could be interpreted from the way someone chooses to [photographically] frame a work of art? Do they include themselves in the image? Are they facing the artwork or the camera? From what angle is the image captured?

In its online searches, the team discovered that one of SFMOMA’s most famous works, Mark Rothko’s No. 14, 1960, is almost always photographed in the exact same way: a wide shot with viewers standing or sitting on the bench in front of the painting, facing it head on.

Yet a search for Jim Campbell’s Exploded Views returns images of the artwork captured from a wide variety of perspectives, from all possible angles and zooms, and multiple lighting schemes.

In order to make sense of these variations, the group designed a web application that aimed to visualize the divergences and parallels that emerge when visitors take and post pictures of an artwork. The image pulled from the SFMOMA Collection API served as the side-by-side reference point for an animated slideshow of images of the same artwork drawn from the Flickr API. This feature was programmed by Fan, who created an application to query the Flickr API using the artwork title, artist name, and “SFMOMA.” Gelinas then created a visualization that displayed the image from the SFMOMA API with the artwork’s title and artist alongside the slideshow of the images returned by the query to Flickr. Seeing the patterns of social image capture for each artwork as they ran through the prototype, it became clear that specific artworks, like Rothko’s No. 14, 1960, compel viewers to construct a very specific framing, while others like Campbell’s Exploded Views inspire hardly any one view at all.

Further exploring crowd-sourced images of artworks, the team began to investigate edge detection by aligning the painting in each photograph. They found, for example, that if they lined up the borders of the Rothko painting for each image in the slideshow, the emphasis was powerfully placed on visitor interactions, rather than on the painting itself. Iterating on their prototype, they then investigated the potential for face recognition using OpenCV via the npm Opencv module to find selfies taken with the artwork. Although it was easy to detect faces head-on to determine if a photo was a selfie or not, this proved less interesting than determining where in the photo the artwork was and centering the photo appropriately on screen. This required using the SFMOMA image as a reference to isolate a pattern of shapes that could also be detected in the Flickr photo. The complexity of integrating the pattern detection algorithms into the prototype, however, was too time-consuming for Art + Data Day. Instead, Team Selfie decided to simply center the Flickr photos in their quadrant of the page, maintaining their aspect ratio.

Artwork sentiment graph

The primary experience with a work of art is emotional and, arguably, it is the purpose of artistic expression to elicit that emotional experience. Working remotely, Higgins investigated whether the titles artists give to their artwork accurately frame the intended emotional response that the work evokes.

Sentiment analysis algorithms are used to find and extract subjective information found in text, with the aim of determining the attitude of the writer. This technique, although imperfect, is commonly used by businesses to identify the sentiment of social-media comments or reviews, making positive sentiment a virtual currency for marketers. The result is the diminished context and nuance of language that, for these purposes, is distilled to a positive or negative numeric value. If artworks, like advertisements, are intended to spark an emotional response, then applying sentiment analysis to the titles of an artist’s artworks will generate an additional data metric. Although this metric itself may be subjective in what it tells us about the artwork or the artist, it does reveal the artist’s tendency over time to express a more positive or a more negative sentiment.

The program Higgins wrote generated a graph of the sentiment analysis for all artworks by a chosen artist in SFMOMA’s collection. Higgins queried whether artists tended to title their artworks more positively or negatively over time. Would a series of positively titled artworks be followed by a corresponding number of negative ones? Was there any correlation between where artists were from geographically and the sentiment of their artwork titles?

Two APIs were used to build the sentiment analysis: the SFMOMA Collection API and an API to a sentiment analysis engine. The Collection API was used to get a list of titles and dates for an artist’s artworks and pass each title to the sentiment analysis engine API, which in turn responded with numeric value between -1 and +1.

The most negative value for a piece of text is -1. For example, The Lonely Metropolitan, a photomontage piece by Herbert Bayer, scores -0.92, a satisfyingly high negative score. Disturbingly, however, the installation titled Pornography in the Classroom [archive master for monitor, “Prick”] registers a highly positive score of 0.76.


Sentiment analysis is programmatically logical and thus incapable of detecting the more complex sentiments of an artwork title such as Skull of a Gorilla, a visually dark and complex painting by Francis Bacon. Sentiment analysis gives Skull of a Gorilla an indifferent score of 0, in contrast to the truly arresting emotional response elicited by the visual experience of this artwork. Additionally, Higgins soon discovered that many artwork titles, such as Untitled or numbers in a series or edition, were scored as neutral by sentiment analysis, and thus returned a numeric value of zero. According to sentiment analysis, these neutral titles have no currency.

The sentiment analysis graph page is built entirely from front-end code, JavaScript, and some enhancements via JQuery to help get data from the two APIs. Developed quickly and with a limited selection of only the first 100 artists from the Collection API, the art sentiment graph provides a framework for engagement and spurs additional curiosity. It is a wonderful example of how a clean API can facilitate sketching with code. Future developments to the sentiment graph might make it much more interactive or possibly display an image of each artwork graphed. As the Collection API can return geographic information, this data might be used to generate a map of sentiment data and reveal whether some areas reflect greater negativity or positivity.

5. Further development of the API

As expected, the outcomes of Art + Data Day both affirmed and challenged SFMOMA’s assumptions about how the API would be used. As discussed earlier, the alpha version of the Collection API was intentionally designed to do two things well: return artist and return artworks. This intentional simplicity allowed for a seamless implementation during the event, with one participant remarking that the SFMOMA API “was one of the nicest and easiest I have used, because it did just what I expected it to.” As an art museum interested in building layers of interpretation and meaning around a work of art, adding the complexity of linked-data capabilities to the API to make complex connections is imperative. For hackers, artists, and creative technologists, however, the most simple format is often the best starting point.

The way that the API was used by both museum staff and creative technologists during Art + Data Day confirmed Hrudka’s initial design, which separates the way the museum serializes the data from how it is presented. This design allows the museum to work with the data in one way, while simultaneously providing a flexible way to present it through the API. The simple presentation mode best serves the initiate, the data hacker, artists, and designers, while a mode for delivering data in a machine-readable format—such as HAL or linked-data standard formats—enables complex connections with other museum collections and data sources. It was clear from projects like Team Context’s that using linked collection data will be necessary for the API’s success. Hrudka believes that by developing the appropriate linked-data vocabularies as context definitions for artists, artworks, and exhibitions, museums can share these vocabularies for machine processing, while still maintaining human readability that encourages playful hacking and exploration.

What’s to come

Future development of the API will expand search capabilities with Boolean and range operations, as well as faceting across artists, artworks, and exhibitions. Although most search appliances already provide this, their complexity and additional security considerations are reasons to implement a more intuitively simple wrapping. A good search interface should correct for misspelled search terms. This same mechanism could also be surprising. Inspired by the experience of looking for a book in a library and finding something interesting or unknown nearby, the SFMOMA Lab team is exploring approaches that allow and encourage a greater degree of discovery. These features remain in development, as Hrudka figures out just how to programmatically facet, link, suggest, and present counterpoints or unique aspects of the data that a hacker, developer, or casual user might not otherwise discover. This idea of “Yes, AND…” goes beyond the simple presentation of data about the collection and is a feature that is, in and of itself, an argument for creating an API, and one SFMOMA hopes to see developed further by others.

The SFMOMA Lab team has also been setting up a new high-resolution tiling image server, although it was not ready for Art + Data Day. This new server will allow users to request images at various aspect ratios and sizes and to make batch requests. Additionally, responses from the API will use the image server to include metadata about the available images for a given object.

Moving forward, the collection data requires greater normalization to facilitate the type of investigations and applications for which it will be used both administratively and publicly. For example, dates are expressed in a human-readable way that is difficult to work with programmatically, such as “circa 1930,” “1930s,” or “before 1900.” Parsing dates like these is difficult; however, the possible visualizations and resulting insights derived from this kind of anomalous data tell an important story within the history of art. To simplify this issue, Hrudka is considering serializing all dates in Extended Date/Time Format (EDTF) and implementing presentation methods for the API to display human-readable strings.

Various field names must also be refactored to make the structure and meaning of the data as clear as possible to those without the specific knowledge of museum curators and registrars. An example of this misinterpretation came up during Art + Data Day, when participants misunderstood what was meant by the term “display” when used in field or property names. Technologists would normally understand the term “display” to mean “for display on screen.” The museum context, however, threw them off, and they associated the term with an artwork’s exhibition, or display within the museum itself. In fact, the museum employed the term in the same way as technologists would, namely to indicate a field that is meant for “display on screen.”

6. Conclusions

There is growing interest and demand for data, particularly around art and museum collections. Art + Data Day illuminated this excitement and revealed many new possibilities for what can be done with a museum’s collection data. Simultaneously, the event clarified some of the fundamental challenges that come with building a museum collection API. For example, a lack of programmatically readable data—the result of hand-entry and years of collecting data in a non-uniform manner—means that there is more work to be done to create a truly robust API. On the other hand, artists working with even the rough data still yielded interesting, creative results.

These learnings point to an important fundamental truth about museum data, or any technology relating to something as subjective as art: processing it, at least for now and the foreseeable future, requires human interpretation and human touch. Ultimately, APIs are not intended as human interfaces. Without an inherent readability, they will likely fail to become tools for exciting interactions designed with humans as the end user. The Collection API, therefore, must be be both human readable and machine processable. At the time of Art + Data Day, SFMOMA prioritized readability over processability in order to tap the creative skill sets of the day’s participants.

The relationship between the technology behind the Collection API and the humans who will both add to and draw from its repositories is still in a nascent stage. Art + Data Day was but one early exploration into this new world. Continued dialogue and design with museum staff, hackers, artists, designers, technologists, and the public will be critical as we continue to build on these experiences and foster creative invention and research.

Cite as:
. "Art + Data: Building the SFMOMA Collection API." MW2015: Museums and the Web 2015. Published January 30, 2015. Consulted .