What the Fonds?! The ups and downs of digitising Tate’s Archive
AbstractIn October 2014, Tate launched online the culmination of a project to digitise a selection of objects from Tate’s Archive. Archives have been digitised before, but the unique ambition for the project was to integrate our archive collection into the same interface as our art collection in the "Art & artists" section of the Tate website. In this paper, two representatives—one from the Archive team and one from the Digital team—present the ups, downs, findings, and pragmatic decisions required for a project of this scale. We cover how to develop a sustainable process for digitising archive material in a museum/gallery context, as well as how our approach and lessons learnt to integrating two collections on the website front end, including developing a usable search interface, presenting complex information as simply as possible, and the difficulties of using the term "Archive" online.
Keywords: collection digitisation, archives, online collections, sustainable processes, user-centred navigation
In October 2014, Tate launched online the culmination of a project to digitise a selection of objects from Tate’s Archive. The unique ambition for the project was to integrate our archive collection into the same interface as our art collection: the “Art & artists” section of the Tate website (http://www.tate.org.uk/art). This paper is from two points of view—the Archive team and the Digital team—and will present the ups, downs, findings, and pragmatic decisions required for a project of this scale.
Tate Archive is the largest archive in the world of British art from 1900 to the present day (Tate, 2013). It holds over one-million items that relate closely to Tate’s art collection, from artists, art-world figures, and art organisations. This paper will firstly consider how archive objects were selected for digitisation, outline the process we chose to take for digitisation—including digitising objects down to the individual page level—and explore how to establish a sustainable digitisation process that can continue past the end of a project’s lifespan.
The second part of the paper focuses on our approach to integrating two collections on the website. We discuss our research, decisions, and lessons learnt for others presenting collection records online, which covers: developing a search interface, presenting complex information as simply as possible, and what gallery terminology really means to our users.
2. About the digitisation project
Our archive digitisation project—known as the Archives & Access project—was initiated in April 2012 following a grant from the United Kingdom’s Heritage Lottery Fund (HLF) (Heritage Lottery Fund, 2014). The overall aim of the project was to digitise approximately twenty-thousand items from the Tate Archive, which would result in over fifty-thousand individual photographs, and to make these items available online so that they could become accessible to national and international audiences. Material was taken from seventy-six different archive collections representing fifty-two artists who lived or worked in Britain. The project has also included a five-year national learning project, as well as a new gallery dedicated to displaying the archive at Tate Britain (Tate, 2013).
As the project was grant funded, it meant that the digitisation process had already been partially conceived during the original bid phase. The bid included two core governing principles: to take a sustainable approach and an integrated approach, which were to guide the decision-making framework and delivery of the digitisation project.
Governing principle: A sustainable approach
The “sustainable approach” principle meant setting mechanisms and workflows behind the scenes to enable future archive digitisation to happen within existing channels in the organisation, rather than creating bespoke one-off digitisation projects that would take different approaches each time. For Tate, this resulted in a large cross-departmental project team, including the Curatorial, Photography, Learning, Library and Archive, Information Systems, Conservation, Digital, Development, Legal, and Finance departments, to help deliver these new processes.
Governing principle: An integrated approach
The “integrated approach” principle meant that rather than publishing the digitised archive records through a separate back-end system into a bespoke separate location online, the ambition of the project was to integrate the archive records into Tate’s existing technical architecture for publishing the art collection online and publish them into an integrated front-end interface (Tate, 2000). Whilst this would present its own challenges, the opportunities afforded by this approach included creating links between the two collections and ensuring that the archive collection would benefit from larger audiences and future functionality and design updates rolled out across the art collection.
3. From the Archive perspective
Tate Archive was established as a public collection in 1969. It contains original, unpublished material relating to the national collection of British art. Its collections provide a unique insight into the lives and practices of artists who have lived and worked in Britain since the end of the nineteenth century.
When Tate was granted funding by the HLF, the money was awarded on the basis that Tate would deliver outcomes following the principles that the HLF stands for (Heritage Lottery Fund, 2015): outcomes for heritage, for people, and for communities.
Because we asked for a large HLF grant, we needed to make sure our project met all three of these principles. This influenced what material we selected and how we disseminated the digitised material. Our project needed to impact on the heritage in the United Kingdom, have a positive effect on our local and national community, share skills, and be sustainable.
We tackled these aims in several different ways: through the selection of the material, through skills sharing with trainees and volunteers, with coordinated outreach, and with a technical digitisation approach.
Step one: Selection of material
The original Tate Gallery at Millbank in London opened in 1897 (Tate, 2012a). Its official name was the National Gallery of British Art, but it became popularly known as the Tate Gallery after its founder, Sir Henry Tate. The gallery was designed to house the collection of nineteenth-century British painting and sculpture given to the nation by Sir Henry Tate. In 1917, the gallery became responsible for the national collection of international modern art and for British art from 1500 on. To make the collections more accessible nationally, the Tate Trustees opened the first Tate gallery outside London in Liverpool in 1988, and the second in St Ives in 1993. To fulfill our responsibility to promote public enjoyment, knowledge,and understanding of British and international art (Tate, 2012b), we decided that our selection of archive material should follow these principles and reflect that this collection belonged to the nation.
In terms of our outcomes for heritage, as we have archives in our collection that relate not only to our local area but to the whole of the United Kingdom, we decided to select material that would highlight different geographical areas. As Tate is one of the nineteen national museums funded by the UK Government through the Department of Culture, Media and Sport, we felt that it was important that we should represent our national collections and fully engage the people who pay for our service. This meant that our heritage aims would fall beyond that of our immediate locality in London and would include as much of the UK heritage as possible. These were split into the following regions: UK-wide, Scotland, North West (including Cumbria and Lancashire), North East (including Yorkshire and Northumbria), Northern Ireland, Midlands, East, Wales, London and Home Counties, South West, and South East. These regions were also integrated into projects with our learning team that ultimately will see us working on outreach projects with partners in Wales, North East (Tyne and Wear), North West (Liverpool), South East (Margate), and London (all boroughs) to utilise the digital material from the archive.
We selected artists that we felt best represented these geographical areas: not only if they were born there, but also if they spent a considerable amount of time working in an area. We also used this opportunity to select material from groups that had, so far, been underrepresented in the collection, such as female and black British artists.
Furthermore, the selection process was also directly impacted by the desire to create a sustainable digitisation workflow. We selected items that represented the diverse physical types of material found in the archive. This would both reflect the archive’s close relationship with the art collection and challenge our ability to overcome problems with the digitisation of different formats so that we would have a good basis for future digitisation projects. This meant that we had to create a process that could cope with all the potential material that exist within archive collections, catering for both the standard archive items such as photographs and letters and also for the more complex items such as a sketchbook that contained inserted booklets and folded pieces of paper. Items selected include notebooks, sketchbooks, and scrapbooks, as well as personal correspondence, diaries, drawings, maquettes, photographs, and press-cuttings, which all serve to illuminate the socio-temporal contexts in which artworks were created and bring working practices to life.
Step two: Digitisation process
The challenge for this project was essentially to create a sustainable workflow to enable Tate to display digitised archives and artworks on the same website and, in so doing, so fully integrate the archive and art collections. Separate systems at Tate have been established for a long time, and the challenge was to get all the different systems talking to each other. Tate uses a number of well-established off-the-shelf databases for collection management purposes. In the archive, cataloguing is undertaken using a database called Calm. This is the standard proprietary archive cataloguing software in the UK. Artworks are catalogued on a separate database called The Museum System (TMS), from which the cataloguing data is taken for the website. All of our images are stored in a image management system called iBase Manager, from which the images for the website are generated. A bespoke system built by Tate called CIS (Collection Information System) pulls together the images from iBase and the data from TMS. Our first step in the integration process was to get all of these systems working with archive data by drawing records from CALM.
Archive data is different to the art collection. This is because it is not catalogued in a linear fashion, but hierarchically, where similar types of items are grouped together in a series and cataloguing information is placed at the most appropriate level of the catalogue. For example, information that relates to a whole collection (Fonds) is catalogued at the top level, information that relates to a series of items is catalogued at series level, and information that relates to one item is catalogued at item level. It was agreed that, because the systems were already in place for the transfer of data from TMS to the website, the archive cataloguing information would be pulled from CALM—its native database—into TMS. This would result in all the information for the website being pulled from the same system. In order to achieve this, the two databases were mapped to match fields in CALM to TMS, to make sure that the same information was in the correct place in both systems, and a script was written to pull the information from one database into the other.
Due to the legacy systems already in place, the digitisation process had a certain amount of restrictions placed upon it that impacted on the workflow. While we were mapping the information from CALM into TMS, it highlighted the differences between the two cataloguing methodologies. All items in the art collection have a single catalogue entry, even if they are physically located in the same sketchbook; however, in the archive a sketchbook would have a single entry for the entire book. Due to the need to have an individual record for every image on TMS, it was decided that every item in the archive would need to be catalogued to piece (or page) level. This has several advantages: for items in bound volumes that would be considered “artworks,” it allowed more detailed cataloguing, enabled the copyright information to be applied to each single piece (thus making it incredibly precise), and also gave the photography team a reference number for each individual image so they could match their work to the correct cataloguing record.
All material in the archive was then manually catalogued so that every single piece had an individual reference number. This was a time-consuming process, but it enabled the data to be transferred to the other systems in use at Tate.
Once selection had taken place, each item was assessed by a conservator. The collections were surveyed to find items that needed essential treatment. This was defined as treatment that was necessary for digitisation, either to make the document safe to handle or to make obscured information or image visible. Due to time constraints, conservation work and photography started at roughly the same time. This meant that a carefully scheduled plan was worked out between the archive, photography and conservation teams to make sure that all the material would be worked on in the correct order.
Once conservation on items was complete, the next section of the process was photography. When all the items were catalogued, the data was transferred from CALM to TMS and then on to iBase Manager. This meant that all the cataloguing records were in the image management system—including handling instructions from the conservators and any requests to redact information—enabling exact matching of the images to the catalogue.
In conjunction to this process, the copyright of every item was cleared. We took the decision to ask copyright holders for a Creative Commons licence for all the archive material so that these images could be reused for non-commercial and educational purposes (Tate, 2014). All copyright information was tracked in TMS, which was another reason for us to pull the cataloguing information through to this system. Once the cataloguing data was in place, the item photographed, and the image copyright cleared, it was ready to be displayed on the Tate website.
Step three: Sustainable integration of the collections
The aim for this workflow was to design a sustainable method of getting archive data from its native system onto the Web. A sustainable technical flow of data between systems has been achieved by this project and has enabled the archive to plan future digitisation projects based on the framework created in this workflow.
While the main aim was a sustainable integrated technical infrastructure, a byproduct of this project has been a better integration of the procedures and practices between the archive and the collection. Although located in Tate Britain, the archive was a relatively hidden service for many staff members; the high-profile nature of this project has highlighted the archive as the fantastic resource that it is.
4. From the digital perspective
In 2012, Tate relaunched its website, which aimed to place the art collection at its centre (Stack, 2010). This included larger images, a new search-and-browse interface, and a different name—”Art & artists”—because user testing revealed that the word “collection” did not mean much to those outside the museum sector. These changes were successful, and “Art & artists” is currently the most visited section of Tate’s website, with between 500,000 and 600,000 visits per month (approximately 40 percent of our users (Villaespesa, 2014: 1)).
The biggest opportunities of integrating the archive collection into the “Art & artists” section were that this material would be found or stumbled upon by an already existing and engaged audience, as well as ensuring the content would benefit from future functionality and design updates to the website. However, it also raised a number of challenges that we will explore with three examples: search interface (an integrated model), individual archive records (a hybrid model), and archive context (a new model). We will discuss how we approached each of these, and the decisions and compromises we had to make to launch the project. Finally, we will discuss the challenges around using the term “archives” online.
Search interface: An integrated model
Integrating the digitised archive material into the same search interface as the art collection was a governing principle of the project. By this, we mean integrating the results into the same front-end system, rather than presenting an integrated search interface that returns results from separate systems. The existing “Art & artists” interface was not just for searching, but also for refining or browsing results. Users could either type a direct query into the search box (e.g., an artist or artwork name) or use the browse tools to lead themselves through the collection (Villaespesa, 2014: 9–10, 13) and return results for artwork and artist records separately. Integrating archive material into this interface raised three main questions:
- How should we present the results for artworks, archives items, and artists?
- How should we handle archive items with multiple pieces in the search results?
- How should we create an integrated browse interface when the two collections have different data sets?
Already having a search interface that returned results across two tabs—”Artists” and “Artworks”—meant that the easiest solution for integrating the archive material would have been to introduce a new tab for these results. However, an analysis of this section revealed that users did not notice the artist and artworks tabs, and so adding a new archive tab would not necessarily have made this material more accessible. We therefore decommissioned the tabs and created a new design with results returned in a single interface, regardless of whether they were artworks, archive items, or artist names (which accounts for over half of the searches (Villaespesa, 2014: 11)). Within this interface, we had to determine the best way to display the multi-piece archive items (such as sketchbooks or letters). In the art collection, J.M.W Turner’s sketchbooks (e.g., the Ancona to Rome Sketchbook, http://www.tate.org.uk/art/search?gid=65818&limit=100) exist as a model for returning results on a page-by-page basis. While this makes the reason a result is returned more obvious, we also know it returns many unuseful results, such as blank pages, and overwhelms users with material. For the archive material, we decided to return just the item-level information rather that the individual pieces, so that users could see the object in context, particularly as the material contains continuous text over several pages. These decisions required compromises. For the search results, we display a lot of information on a single page, including pushing matching artist results to the bottom of the first results page and, therefore, making them difficult to find. For only showing archive items in the search results, we risk not being clear about why a search result is being returned, as the match may be happening at piece level rather than item level. Further analysis and testing will have to be undertaken to see how people use this page and how we could improve this design.
In the browse interface, we had to determine which facets should be used to explore the artwork and archive collections together. Although we did an analysis of which facets were common across both collections, rather than be led by this—an easier route—we instead based our decisions on user research into how audiences navigated the “Art & artists” section to work out what users were looking for (Fildes & Villaespesa, 2014).
This evidence made it clear that, for example, facets common to both collections, such as the subject index, should be included because it was a popular way to browse content (Villaespesa, 2014: 13). However, it also highlighted other popular facets we should keep even though they were only relevant to one collection; for example, browsing works on display (17 percent of searches) and Turner (who was the most-searched artist), both only relevant to the art collection, or images a user could download under Creative Commons (a fifth of searches had wanted this feature), currently only relevant to the archive collection (Villaespesa, 11:2014).
Our main lesson learnt from the search interface work was that we needed to have started it much earlier. Even though we knew this was the main entry point into the collections, we tackled the piece of work towards the end of the project. This meant that we were not able to implement all of the search facets we wanted to (e.g., offering users the ability to narrow their results by records that have accompanying text); nor perfect the design of some facets (e.g., the “browse by date” feature); nor user test the search interface we went live with. In retrospect, we should have tackled this interface much earlier in the project.
Archive records (a hybrid model)
Our approach to the individual archive records was to use the same concept as the artwork records, letting the image lead the page, but also recognising that the design had to accommodate different or new content: longer, descriptive titles; different tombstone categories to the artworks; and a hierarchical relationship between a whole item (e.g., a sketchbook or a diary) and its individual pages. This, therefore, created a “hybrid model” for integration: taking the same concept but adapting it for the new content. Crucially, at this stage, it was also important to define the scope of the project: we were still integrating the archive material into the existing collection interface rather than redoing the whole collection interface. This meant that within this digitisation project, it was not an opportunity to make the “Art & artists” section of the site responsive, which would have made the scope of the project too broad and undeliverable.
One-piece archive items, such as a photograph, have the same model for the art collection: one image for one set of metadata. The challenge came with representing multi-piece items such as a sketchbook, notebook, or diary. These pieces all have both unique metadata (for example, the page title) and common metadata with the item (for example, the owner). Our solution was to present pieces as an in-page slideshow. Users can always see the item title and core tombstone information while navigating sideways through pieces of the letter, with the unique metadata elements changing accordingly.
Within this model, we were also able to introduce more complex elements such as inserts. Our main lesson learnt here was to keep the scale of the complexities in perspective. Whilst, of course, it was important to find solutions for these edge cases (handling inserts or the seven-hundred-page book), we in fact made this our focus. In reality, they only represented a minority of cases, and we should have carried out this analysis of the data earlier to focus on creating a model that worked for majority of cases first and then fit the “edge” cases around this, rather than the other way around.
Archival context (a new model)
The archival hierarchy—the structure that shows where an archive object sits within its Fonds (archive collection), which can, at its most complex, have the hierarchy of: Fonds; sub-Fonds; series; sub-series; file—is an integral piece of information to understand and contextualise archive objects. It shows both the vertical and horizontal relationships between objects in an archive collection. On the archive object pages, we have presented the hierarchy as both a breadcrumb at the top of the page and as a tree-structure path in the “series” tab under the object image. However, the main challenge was to design a Fonds page and integrate this into the “Art & artists” section when there was no equivalent structure in the art collection.
We decided the Fonds page should be browseable via not just the tree-structure hierarchy of the archive collection, but also the search facets, so that it was more accessible to users who were unfamiliar with archive cataloguing techniques. In a round of user testing, we found that leading with the search results of objects worked better for non-archive specialists, but that in general they did not understand the page. However, the main lesson learnt from this element of the project is to know what your page is for. As previously mentioned, the purpose of the project was to make the archives accessible to a wider audience; from our research, we know that a range of audiences, from children doing homework to academics to people just looking for images, browse the art collection (Villaespesa, 2014: 6). However, these Fonds pages are technically complex, and we needed to explore this further. Rather than trying to make sure the page catered for all audiences, we should have followed the user testing, which suggested that these pages are really aimed at “super-users”: people who are highly engaged with the archive material. Just because we could offer faceted searching in individual collections did not necessarily mean we should have, if it did not support the purpose of the page.
What is an archive?
Although we were integrating the art and archive collections into the same interface, there was a lot of discussion about whether we should signpost them as different collections, such as a category displayed in the search results or a banner at the top of each archive item page. However, a round of usability testing highlighted a greater challenge: users did not know what the term “archive” meant. There was partly a lack of awareness that an art gallery could have an archive; but in general, the context of being online meant that the word “archive” had different associations, such as old emails or old websites, and we saw that users understood it to mean a distinction between either historic artworks (archive works) as opposed to contemporary artworks, or between on or off display (being “in the archive” meant an object was in store).
This meant that introducing obvious archive signposting onto the site could cause more confusion for users instead of helping to distinguish the two collections—especially as, when probed, the majority of users felt it was more important to find the material rather than to know which collection it belonged to. Instead, we decided to introduce this material via a specific landing page and carried out informal paper testing in the gallery to trial three names for the section: “Tate Archive”; “Artists’ archive”; and “Sketches & letters.” Even though this was not the most rigorous method, it was enlightening. Both “Tate Archive” and “Artists’ archive” maintained the themes of either old versus new works or on versus off display:
“Work Tate has but not on display” – generalist
“Different works by artists, from the past” – generalist
However, introducing the types of materials in the title “Sketches & letters” not only meant that users understood it to be different material, but expanded this to talk about artist processes:
“Preliminary stuff” – generalist
“Correspondence from artists, letters about work” – generalist
We therefore decided to lead with a variation of this title for the section—to make the idea of the content from the archive more accessible to the audience—before explaining more about the Tate Archive on the actual page. Whilst organisationally there is still uncertainty about this decision, and indeed we hope to use A/B testing to explore different naming options on a more quantitative basis, it was an important lesson that what may be obvious within your organisation may not be that obvious for outsiders.
5. Lessons learnt
This project has been very successful in meeting its objectives; however, the archive team has learnt a number of lessons while working on this digitisation process and workflow. These are as follows:
- Do not underestimate the need to check original cataloguing.
- Think outside processes that are already in place.
- Think about this in terms of a project team to aid communication, rather than as separate systems working together.
- Think carefully about what exactly you want to do: do you need to capture everything, or would excluding the more complex items create a much simpler workflow?
- Make sure there is enough contingency in both budget and time.
- Do not underestimate the impact on time staff changes will have; do not assume everyone will stay for the duration of their contracts.
- Acknowledge that errors will be made, and make sure that any system that is used can accommodate this.
- Do not underestimate how much of an impact a large digitisation project will have on the day-to-day running of the archive. If this is a priority, make sure that this is acknowledged.
- Make sure everyone has access to all the databases and someone knows how to use every system in the workflow.
There were also a number of lessons learnt from integrating the material into the “Art & artists” section of the Tate website:
- Use evidence, whether that be a full on survey, a user-testing session, or informal in-gallery testing, to make your decisions
- If they work, make use of existing models, templates, and design concepts
- Do not leave core pieces of work until the end of the project when they cannot be fully tested
- Know the scope of your project: an integration project is a lot of work and not the best time to start lots of new initiatives
- Keep content complexities in perspective by analysing how often they really do occur
- Know the purpose and audience for each of your templates: just because you can do it does not mean you should
- What may be obvious within your organisation may not be obvious for outsiders
- Compromises will happen: archive material is complex, and there is not always an easy answer
This project has resulted in many positive outcomes. It has improved cross-departmental communication, increased awareness of the archive both with staff and external readers, enabled the archive to reach new audiences, and allowed completely different uses of the material. This was Tate’s first large-scale archive digitisation project, and it was complex and challenging; however, ultimately it was a very rewarding experience both for the individuals involved and for the institution as a whole.
We would like to thank all our colleagues involved in the Archives & Access project, whose hard work and expertise were behind the thinking and decisions we discuss in our paper. We would particularly like to thank Polly Christie, Jane Bramwell, Adrian Glew, and John Stack. Also thanks to John Langdon and Sebastien Francois for proofreading this paper.
Fildes, E., & E. Villaespesa. (2014). “Archives & Access project: ‘I am looking for’ or ‘what am I looking for’?” Blog. Last updated June 5, 2014. Consulted January 10, 2015. Available http://www.tate.org.uk/context-comment/blogs/archives-access-project-search-browse-and-inspire
Heritage Lottery Fund. (2014). “Tate launches world-wide access to unpublished archives of key British artists.” Last updated December 16, 2014. Consulted February 13, 2014. Available http://www.hlf.org.uk/about-us/media-centre/press-releases/tate-launches-world-wide-access-unpublished-archives-key
Heritage Lottery Fund. (2015). “The Difference We Want Your Project to Make.” Consulted February 11, 2015. Available http://www.hlf.org.uk/looking-funding/difference-we-want-your-project-make#Outcome_heritage
Stack, J. (2010). “Tate Online Strategy 2010-12.” Tate Papers Issue 13, Tate. Last updated April 1, 2010. Consulted January 14, 2015. Available http://www.tate.org.uk/research/publications/tate-papers/tate-online-strategy-2010-12
Tate. (2000). “Insight: The digitisation of the Collection.” Last updated March 2012. Consulted January 25, 2015. Available http://www.tate.org.uk/about/projects/insight-digitisation-tate-collection
Tate. (2012a). “History of Tate.” Consulted February 11, 2015. Available http://www.tate.org.uk/about/who-we-are/history-of-tate
Tate. (2012b). “Our Priorities.” Consulted February 11, 2015. Available http://www.tate.org.uk/about/our-work/our-priorities
Tate. (2013). “Transforming Tate Britain: Archives & Access.” Consulted February 13, 2015. Available http://www.tate.org.uk/about/projects/transforming-tate-britain-archives-access
Tate. (2014). “Creative Common Licences and Tate.” Last updated October 2014. Consulted February 11, 2015. Available http://www.tate.org.uk/about/who-we-are/policies-and-procedures/website-terms-use/copyright-and-permissions/creative-commons
Villaespesa, E. (2014). Art & artists. Digital audience research report: Understanding people’s motivations and usage of the online collection. Consulted January 15, 2015. Available http://www.tate.org.uk/download/file/fid/37523
. "What the Fonds?! The ups and downs of digitising Tate’s Archive." MW2015: Museums and the Web 2015. Published February 15, 2015. Consulted .