Support Projects for Enhancing Function
Developing a Comprehensive Database of pre-Modern Books using Kuzushiji Optical Character Recognition (OCR) Technology
Fiscal 2016
We aim to improve dramatically the utilization of rare books and old materials held at the Theatre Museum by using the kuzushiji OCR technology system, create a new environment to promote research, and enhance databases related to rare books.
Summary of research findings, fiscal 2016
We positioned this fiscal year as a basic preparation period for creating a comprehensive database of rare books, choosing joruri maruhon and kabuki banzuke as subject materials. We created a character style database and its new display system to improve the method for increased accuracy and practical application of the kuzushiji OCR functions. Going forward, we will increase the utilization of OCR technology while considering the application and usage to increase overseas researchers’ interest in rare books and spread this decipherment technology.
Joruri maruhon, a publication that includes joruri text and annotation on oral performance, had spread widely throughout the country after the mid-17th century. Joruri maruhon materials are found at museums throughout Japan; however, the Theatre Museum boasts the largest collection in the country. This year, we looked at Kanadehon Chūshingura and created a character style database consisting of approximately 48,000 characters, focusing on unique typefaces and characteristic type settings (narrowness of the spaces, inclination, etc.) to develop a new display method that faithfully reflects the position and format of characters within the document.
Kabuki banzuke, or playbills listing the program, casting, and other details of a performance, were published in large quantities after the mid-17th century. The Theatre Museum has a collection of banzuke from all corners of the country, including Edo and Kamigata, covering the entire Edo Period. Although the arrangement of characters in these documents has regularity, the variation in character width has posed a technical challenge. This year, we selected 18 one-page kaomise banzuke pieces that introduce the cast of each Kabuki company in the new year and 18 booklet-style yakuwari banzuke released for each show to create a special character style database by grouping characters by element, such as frequently used surnames.
An example of the new display method for the reprinted text of Kanadehon Chushingura
See here for the kuzushiji viewer
Utilizing a character style database that systematically collects character data extracted from rare books will allow us to promote the improvement of reprinting technology. In addition, it would enable beginners and foreign researchers to master the decipherment technique as well as allow such applications as linking to related databases based on released data. As such, we are using an innovative new display system that can be operated intuitively and has versatility, after seeking technical advice from Toppan Printing Co., Ltd. on processing the creation of such a system. We aim to improve the accuracy of the OCR and make the database available on the web while keeping in mind such combination of innovative technologies and experts’ knowledge, systematical analyses of such areas as singing method written on the letter notation on joruri books, and development of a database of banzuke pieces that record box office information.
Character style database of character “とto” from Kanadehon Chushingura