KBR in cooperation with UGent, ULB, VUB and UCL invites you to attend a scholarly series on digital cultural heritage: the KBR Digital Heritage Seminar.
In this series from February to June 2022 we will virtually host three academic scholars in presenting their work on cultural heritage and specifically on image processing. “The devil is in the details!” When it comes to digital cultural heritage, this is as true as “The devil is in the images!” Great efforts have been devoted to the digitization of original collections in the cultural heritage. On the one hand, this helps greatly in promoting the collections and in allowing the general public to have much easier access to the collections (e.g. by publishing the images on websites like our digital library Belgica). On the other hand, technologies still need to be advanced in order to fully exploit the information (e.g. texts) that are still locked behind the digitized images. In this series, we are very honored to have three researchers who have rich experiences in image analysis and especially for extracting information from digitized collections.
Clemens Neudecker, Berlin State Library, Berlin, Germany will present on “New Tools for Old Documents – Layout Analysis and OCR with Deep Learning and Heuristics” on 11 April 2022
14:00 - 15:30 CEST.
This talk will discuss the main achievements and experiences of the QURATOR project at the Berlin State Library (SBB) for document layout analysis. Historical documents that are being digitized in large quantities by libraries and archives frequently exhibit a wide array of features that disturb layout analysis, such as complex layouts with multiple columns, drop capitals and illustrations, skewed or curved text lines, noise, annotations, etc.
In order to deal with these challenges and defects, a robust document layout analysis was developed that is implemented by pixel-wise segmentation using convolutional neural networks. In addition, heuristic methods are applied to detect columns or marginalia, and to determine the reading order of text regions. A key objective lies in feeding the resulting outputs to subsequent processes like a text recognition (OCR) engine or an image similarity search.
About the speaker
Clemens Neudecker studied Philosophy, Computer Science and Political Science at Ludwig Maximilian University (LMU) of Munich. For more than 15 years, he has been working in R&D at various Digital Libraries, including the Bavarian State Library and National Library of the Netherlands. Clemens is currently a researcher and a project coordinator at the Berlin State Library. He is also a member of the Council at Europeana, the European Union’s digital platform for cultural heritage.