Workshop: Opening up Born-digital Data for Researchers @ DHBenelux Conference

This workshop on born digital data & research is organised by research partners in the BELSPO funded KBR coordinated BelgicaWeb project. This research project aims to (1) investigate how to sustainably provide access to Belgium’s born-digital collections for both the public and researchers; (2) create born-digital collections by capturing social media and web content; (3) aggregate and enrich existing (meta)data at KBR using linked open data, controlled vocabularies, Natural Language Processing and other digital methods; (4) develop the necessary data infrastructure by selecting the best (open source) technologies and sharing (open access) information and building on the best practices from international networks and infrastructures (e.g. DARIAH-EU, IIPC, WARCnet, RESAW); (5) analyze the relevant legal frameworks (e.g. data exchange, copyright in the context of text and data mining, data protection and privacy rights and freedom of expression) and (6) promote and raise awareness about Belgium’s born-digital heritage.

This workshop will take place on 4 June 2024 from 13:30 – 16:30h during the DHBenelux Conference at KUL. Registeration can be done via the DHBenelux conference website https://2024.dhbenelux.org/ starting in May.

The workshop aims to explore strategies, techniques, and methodologies to ensure the scientific exploitation and social valorization of born-digital archived (social media) content, thereby enabling new forms of data access. By convening experts from diverse disciplinary backgrounds, the workshop seeks to identify best practices, share insights, and facilitate new avenues for exploring archived social media. To achieve this we will present various insights and tools, such as a framework for Web Archival Literacy as a self-assessment questionnaire archival intelligence scale, followed by the solicitation of ideas and feedback from participants. To guide the discussions, the workshop will adopt the conceptual devices of ‘orientating,’ ‘auditing,’ and ‘constructing’ developed by Ogden & Maemura (2021). These devices, describing common research practices and associated challenges, will overlap during the workshop rather than being presented as a linear workflow or fixed set of practices.

In the ‘Orientating’ phase, dedicated to understanding the archive and its interface, emphasis will be placed on grasping the institutional collection development policies and strategies. This phase is crucial for comprehending how the archive is shaped and governed by its overarching goals and audience considerations. Particular attention will be given to exploring published collection development policies of relevant institutions, which serve as guiding frameworks for content selection. Additionally, discussions will revolve around the influence of these policies on shaping the archive’s content, and the potential implications for enhancing accessibility and engagement within the research community.

The ‘Auditing’ phase, centered on contextualizing data by tracing the history of collection practices and curation decisions, will address and discuss researcher needs in terms of accessing born-digital content.

The workshop will end with a ‘Constructing’ section (e.g., involving the selection and aggregation of data from sources across collections) in which researcher needs in terms of content enrichment and implementation for enrichment will be discussed.