Computer aided researches in Medieval Japanese History
1998年9月ベルギー、ルーヴァン・カソリック大学で行われたヨーロッパ日本関係資料専門家会議での報告です。
あまりに古すぎますが、旧職場に関わるホームページ閉鎖にともない、データ保存のため、ここに移しました。
Computer aided researches in Medieval Japanese History-----Its status quo and future.
24-09-1998 HOTATE Michihisa
Historiographical Ins. The University of Tokyo.
The first problem we encounter when discussing the use of computers in historical and archival studies in Europe and Asia to be the quantity of historical documents we have in each of our countries. Moreover, in order to get an idea of how many documents there actually are, we already need a computer. So this thesis leads to a kind of tautology , but I think that in the future we will know the amount of written materials by the number of bites that they required.
Consequently, since we do not yet know the exact number of written documents, I can only give approximate numbers. For instance, the "Shousouin-monjo(「正倉院文書」)", documents of 8th century, consist of about 10,000 documents. And "Heian-ibun" that is compiled by Rizo TAKEUCHI, documents of 9th c. to 12th c. is consist of 5,500 documents. "Kamakura-ibun" , also compiled Rizo TAKEUCHI, documents from the end of 12th c. to the middle of the 14th c., consist of 33,000 documents. And, in the times from the end of the "Kamakura-era" to the begining of the "Edo-era", we have no measure to count numbers of documents, but it seems over 200.000. Further more we have diaries("日記")by aristocrats , various literature("物語""和歌etc.), classics("典籍" ) and numerous religious documents(聖教). In any case, as far as the archives from the 8th century till the 12th century are concerned, Japan has the largest extant collection, comparing to any other country, especially in Asia..
Of course these are no concrete statistics, mainly because there are still enormous numbers of medieval documents that are not yet published neither catalogued. Most of them are in the stacks of big Buddhist temples and sometimes they are written on the reverse pages of sutras. Further more, as you know, manuscripts written with a brush(毛筆) on Japanese thin paper are difficult to read, especially when they are written in various and arbitrary script forms.
Consequently the compilation of Japanese medieval archives is hard labour which will still take a long time. For instance, the compilation of the "Todaiji-monjo", the most famous archives of Japanese ancient and medieval history, will be under the present conditions scarcely finished around 2200 AD. Thus, the reason why the working group of the Historiographical Ins. decided to make databases of historical documents, is because we wanted to accelerate and streamline the compilation of archives by computer.
In case you would like to have a look at our work, the only thing to do is to check our Home-page with a powerful computer that can read and write Japanese freely and that is equipped with a operating system specialized for Japanese language and characters. Since not everyone of you maybe possesses such a computer, I will explain the outline of our project.
1, Image database of documents(「画像史料データベース」).
Two different formats can be consulted.The first format is the "Eishabon"(影写本)version of the document. "Eishabon" are elaborate copy-books of documents written by brush(毛筆), made in pre-war times, when photographs were not yet frequently used for historical researches. About 7000 volumes containing perhaps 200,000 medieval historical documents are arranged in the institute. Now, because of their value, we must get permission from their possessors to open them on the internet. As a result of this, only two series are available, namely the "Toji-hyakugou-monjo" and the "Daitokuji-monjo". At the other hand, these two series are one of the most important historical manuscripts in medieval times. Since we are negotiating with other document possessors, mainly Buddhist temples or Shinto shrines, I hope we will be able to offer more series in the near future. If you want to see the "Eishabon" image file, you need a software that can read "TIFF G4" image files, as mentioned on our home page.
The second format is a photo-image of the document. Now opened is a photograph of the "Irikiin-monjo" which is known for its compilation and translation into English by Kanichi ASAKAWA. This file is a "j-peg" image file. In the not too distant future all documents possessed by the institute, like for example the "Shimazu-monjo",will have their photo-image on the internet.
2, Full-text database of historical materials.
We have two full text databases of medieval documents on the Web site of the Historiographical Institute. These can be accessed almost in 1 second even from Europe which is much faster than the Image databases. Of course you cannot get whole texts from this system, because it is only a KWIC(Key Word In Context) search system. What you get is a 20 character-line including the keyword in its center, but you can get complete information on where a word, or a single character appears in various documents.
The first database is "Full text Database of Japanese old archives"(「古文書フルテキストデータベース」). We started building up this database last year with the Government Grant for the promotion of scientific research. At this moment our database consists of the complete content list of the "Dainihon-Komonjo-Iewake"*1, including a total of 120 volumes and about 42,000 documents, and the full-text of several books from this series*2. We are still in a very early stage, but with governmental aid, I expect that we will finish this work within the coming 10 or 15 years.
The second database is a " Full text Database of historical materials in Heian-era"(「平安時代フルテキストデータベース」). It now consists of "Heian-Ibun"*3, compiled by Rizo TAKEUCHI, and the text of several books which are incorporated in the "Dainihon-Kokiroku"*1. Further more, a CD-ROM of "Heian-ibun" has been published*2 .This CD-ROM gives information not only about a certain keyword, but also the relevant texts.
In Japanese history the Heian-era is appreciated for its classical culture. Since I myself major in the Heian history, I expect this database to promote historical studies that are performed not only by historians, but also by researchers in literature or religious history. I really do think that at the moment we can integrate also literary works ,such as the "Genji-monogatari"or the "Konjaku-monogatari" in these databases, we will have reached a new stage in historical research. Furthermore we are planning to input the "Kamakura-Ibun"*3, another compilation by Rizo TAKEUCHI. When accomplished, research in medieval history will change rapidly.
3, Catalogue Database of Historical Documents.
We have also two catalogue databases, the "Catalogue Database of Historiographical Ins.'s Library"(「史料目録データベース」), and the "Catalogue Database of Historical Archives"(「古文書目録データベース」).
The "Catalogue Database of Historiographical Ins.'s Library" is a catalogue database of our departmental library. All the data on every single historical document, manuscript, picture, "Eishabon", and photo-book on historical documents(total 26475 volumes) that our library possesses, were recently computerized at high speed. So that, now, this database has an important status among our databases. For instance, by inputtig the name of a document owner, you get all the information on researches ever done from our Ins.'s foundation. It is very useful for historians and archivists. The Image Databases mentioned above are linked to this database and is normally accessed from this site.
The second database or the "Catalogue Database of Historical Archives" is a database limited to "Eishabon" only. This database arranges and names the "Eishabon" documents according to the original structure and rank of archives, its series, fond, cell, etc. Consequently only an archivist who has enough paleographical knowledge can do the necessary preparations before inputting the data. Moreover we need to identify the various documents by their name and only then we can link the image-data, full text-data, and the manuscripts themselves. To make a catalogue of 200,000 documents is not a simple work. We have received the Government Grant for the promotion of scientific researches for about 8 years , but even now, we cannot say exactly, when this work will be finished.
I have given you a simple explanation to some of the databases available on our Web site. Because of my own major my explanation was limited to those databases that contain information on medieval documents only. But, also in case you major in Japanese modern history, e.g , and you are especially interested in the history of international relations between Japan, Asia and Europe, you can find databases on our Website that might be interesting and usefull for you*1. Unfortunately, I am not the right person to introduce these databases. So instead, I would like to say something about what I think to be major issues concerning full-text databases.
In fact, I must admit that when I started these database projects, I was inspired by a lecture made by Leopold GENICOT in 1982, during his visit to Japan , which is translated into Japanese*2 . He said in his lecture, that if you want to use the computer for historical studies , document criticism, restoration, exact compilation and interpretation, historical materials must be fully put into machines.
And bellow figure 1. is an example list of the term "defensio" in all Belgian historical documents before 1000 A.D., which he showed in his lecture. What we want to do is to build up the similar system to treat Japanese medieval documents. And figure 2. , almost similar to figure 1., is a example list of the term "束" , which means a "bunch of rice grass with grains", retrieved from our system. This term often appears in documents from the 8th century till the first quarter of 11th century, when tax-payment mostly took this form. After this period this term almost disappeared from documents because tax or rent was from then on paid by the straw bag("Tawara",俵). Thanks to this list we can thus determine this important change. In fact, last month, I wrote a paper concerning tax systems in Heian-era in Japanese early medieval times using this list.
Next I would like to report on some technical problems that we have to clear. First of all there is the problem of processing all these documents: transfering complex forms of Japanese archives to a digital format is not an easy task since inputting by digital scanner is not possible.
We use the HTML as input format and the full text retrieval system as main database. As a result of this, we are able
1) to use the published content list(目次) in order to make the hyper links to the relevant text.
2) to build a KWIC search system on the Web site of our institute.
3) to read various notes attached to the text supplying incidental informations or indicating wrong characters by clicking the highlighted word.
As you know , the biggest trouble in building databases is to pre-proccess the documents to be input. But thanks to this system and to the existence of data input companies, we do not have to bother about this trouble. Now, we simply hand in books to a data input company. But nevertheless various obstacles to complete the full-text database still remain. The most difficult thing is to manage the special characters("gaiji"「外字」) which are not supported in our operating system. In fact, we have no solution to this problem . Or rather , it must be treated in a wider scale as mentioned later. The only thing that we, researchers, could do to contribute to the solution of this problem was to make a list of special characters("gaiji") necessary for managing medieval Japanese documents. And that we did.
That's about all that I can say about our project. The only thing left for me to say are some of my ideas on how we can ameliorate and expand the system.
First of all there is the problem of mistakes, which necessarily occur when compiling and publishing the documents and then again transforming the documents to digital data. We are realist, i.e. we do not guarantee a 100% correctness of our digital data, especially in our first stage. The reason for this is, to tell the truth, a lack of time to proofread the numerous galleys, which usually are more difficult to read than ordinal typography. Of course, the main purpose of our institute is to present correct, if possible completely correct texts, but under the present circomstances this can only be achieved in as far as printed matters are concerned, and not yet digital data. At this moment we still put emphasis on accumulation and the opening of documents. From the historian's point of view the main role of a database system in this time is to provide a search system, rather than an absolutely correct text.
Secondly we have to think on how to promote the computerization of documents for other Japanese historical periods, but I don't have any concrete ideas on this point. At first, as you know, Ancient Japan has important documents, for example the "Kojiki"(『古事記』),"Nihonshoki"(『日本書紀』)、"Rikkokusi"(『六国史』), etc. but these are not digitalized, and if they were, it would be difficult to make them accessible on a network because of its copyright . I expect this problem to be solved in the near future. A more difficult problem concerns the accumulation and providing of the digital data of documents from Japanese Early Modern times(so called "Edo-jidai",江戸時代). These days there are so many documents , so that we don't yet have a total plan of how to manage the digital data of these documents. I think this may be solved by two ways. The first way is to transform micro-film data to digital data or image data that can be opened directly on the internet, or to CD-ROM that can be dubblecopied for offering. This should be done by each public institute that possesses documents. Secondly the form of compilation itself has to be changed. In the near future, the major form of compilation of documents from Early Modern times must be done on the Web site, although the publication of books will remain necessary.
Thirdly, there is the problem of "Gaiji". It is obvious that we must have a list of "Gaiji" with cords and fonts to print or display on computer. Moreover we want to be able to use these characters freely on the computer. But, as I referred above, this problem concerns various branches of computer techniques and various disciplines within historical research. At this moment in Japan several projects take on, I heard, such as "e-Kanji"(eー漢字), edited by Tetsuya KATSUMURA, of The University of Kyoto(see his home-page).
We have not yet decided on a policy concerning gaiji. In the database of the "Heian-era", we display a black square "■ " for each "Gaiji". But in the database of "Japanese old archives", begun last year, we are planning to display the character number of the "Morohasi-daikanwa"(諸橋轍次『大漢和辞典』) instead of ■. This we do in anticipation of the completion of the above mentioned projects which will make it possible to replace these character numbers again by "Gaiji".
Obviously, it will be a great problem for European Japanologists , because in the tradition of Oriental Studies they do not only deal with Japanese but often also with Chinese documents. As for Japan, the "gaiji"-problem is a big problem that concerns the international relations between the academies of Japan and other east Asian countries.