Blogpost: Phaedra Claeys: On spreadsheet nightmares and database dreams

As part of our recent Digital Humanities Doctoral School's programme, participants were asked to write a blogpost capturing their experiences with the digital humanities. In this week's blogpost, we welcome Phaedra Claeys, a doctoral student in the Languages and Cultures department, section Eastern European Languages and Cultures (Faculty of Arts and Philosophy, Ghent University). Phaedra is affiliated to the research group Ghent Centre for Slavic and East European Studies (GCEES). She obtained a master’s degree in Eastern European Languages and Cultures (Russian and Slovene) from Ghent University (2013-2014) and also holds a master’s degree in French (2015-2016). As of October 2016, she has been working on a PhD project on the interwar Russian émigré newsmagazine ‘Illustrated Russia’ under the supervision of Ben Dhooge.

On spreadsheet nightmares and database dreams: Developing a database for the research project on la Russie illustrée

1 periodical. 15 volumes. 748 issues. Over 18.000 pages. 1 Excel file. These are the main ingredients of my past and current nightmares. I will admit that this is a bit exaggerated, but collecting and managing a vast amount of data whilst trying to maintain an overview in Excel spreadsheets can get on your last nerves. It inevitably results in endless scrolling and since typos are easily made, there’s a good chance of eventually irretrievable data. This is the main reason why I enrolled in the Doctoral School on Digital Humanities. As a total neophyte in everything digital, my hopes and expectations were straightforward: learning as much as possible on how to successfully develop databases.

Being new to the field has its benefits and disadvantages. Since virtually everything is new, even the remotest connection between my research and a certain tool or approach provided new perspectives on the digital in my humanities. Reflecting about other, slightly similar cases often shed light on new ways how (not) to tackle my data management perils. Also the input of both the lecturers and the other students functioned as a fresh pair of eyes. On the downside, my restricted knowledge of terminology combined with the digital community’s penchant for abbreviations caused some serious hold-ups: trying to figure out the meaning of DTA, LDA, OCR and API, I often failed to hear the rest of the explanation. There was, however, one thing which I understood almost immediately: literally every tool in DH requires some degree of coding. Despite – or perhaps just because of – the ubiquity of coding, we never really learned why this is imperative. Although convinced of the use of coding, Lee Ann Cafferata, a ‘digital historian’, argues in her dissertation research blog The Notebook that “pushing humanists to learn to code for the sake of coding equates with learning how to use a tool without understanding where, when, and why it’s useful”. And that is exactly how I felt: ignorant of the what, how and why. But apart from the lack of theorizing on the necessity of coding, there also was no practical introduction and we just hit the command line right away. For someone who didn't even know what a command line was – let alone what to do with it – I’ve had my fair share of bungling. Offering a crash course in Python for lost causes like myself would have been a plus, especially for a doctoral school that doesn’t require any preliminary knowledge whatsoever. Fortunately, there was still the internet and The Programming Historian to help me get my toes wet!

Due to the diversity of DH as well as of the other participants’ various projects, the doctoral school offered a broad overview of the field. Therefore, most of the explored tools and approaches were unfortunately not applicable for my own research. I was particularly looking forward to the session on digital text analysis, but sadly this lecture focused on tools for ready-made digital corpora. Departing from paper copies, I would have loved to learn more on how to make my corpus more accessible through digitization and optical character recognition. So no matter how cool the maps, charts and diagrams in all shapes and colors imaginable I’ve seen people conjure up (while I lagged a good five steps behind), in my view the very basis was missing. But also the philosophy behind the use of these digital technologies in the humanities, especially in literary studies, could have been discussed more. Warning against a “digital wasteland”, Shirazi [1] wonders whether it is of any use to build digital tools “only to replicate the scholarly methods of the print age”. Although, from the perspective of my own research, I can immediately point out the benefit of a number of these digital tools, I agree that reflecting on this topic is quintessential for a better understanding – and thus better application – of digital tools in the humanities.

The session on ‘Managing a digital humanities project’, however, somewhat filled this reflective gap and did put my research in the right direction: not by offering a ready-to-use database, but by focusing on the importance of a well thought-out Data Management Plan. How trivial it may seem, I now get that it’s quintessential to understand the type of data you’re working with and to envision how every action will influence the next. Even though I’m still using Excel spreadsheets as we speak, this strategy has resulted in concrete steps and significant progress towards the creation of my database. Soon, my spreadsheet nightmares can turn into database dreams.

References

[1] Shirazi, Roxanne. "A 'Digital Wasteland': Modernist Periodical Studies, Digital Remediation, and Copyright." In Creating Sustainable Community: The Proceedings of the ACRL 2015 Conference, March 25–28, Portland, Oregon, edited by Dawn M. Mueller, 192-199. Chicago: Association of College and Research Libraries, 2015.