Conceptual Modelling of Networking of Centres for High-Quality Research in Slavic Lexicography and Their Digital Resources
The Institute for Information Transmission Problems (IITP) of the Russian Academy of Sciences (Kharkevich Institute), one of the leading research institutions of Russia, was founded on the initiative of Professor A.A. Kharkevich, member of the USSR Academy of Sciences, in 1961.
The research staff of the multidisciplinary Institute consists of mathematicians, physicists, biologists, engineers and linguists, who are engaged in fundamental research into problems of information transmission, distribution and processing, biology, theoretical and computational linguistics.
The Laboratory of Computational Linguistics of IITP (LCL) is headed by Prof. Igor Boguslavsky (acting head Dr. Leonid Iomdin). The founder of the Laboratory, Professor Juri Apresjan, full member of the Russian Academy of Sciences and a prominent Russian linguist, headed the laboratory for over 10 years and remains its principal researcher. Other leading researchers include Dr. Vladimir Sannikov, Dr. Leonid Tsinman, Dr. Leonid Mitjushin, Dr. Svetlana Grigorieva and Dr. Tatiana Frolova. Most recently, LCL has been joined by several gifted university graduates: Konstantin Druzhkin, Vyacheslav Dikonov, Denis Valeev, and Anton Kazennikov.
The principal focus of research carried on in LCL is the functioning of natural language. Fundamental research aims at the elaboration of a fully operative linguistic model of the "Meaning Û Text" type. The model is intended to simulate the language behaviour of humans, that is, their ability to produce texts in a natural language and understand them. The computerized version of the model developed by LCL is shaped as a poly-functional multilingual processor known under the name of ETAP, which consists of large-size dictionaries of the working languages and various sets of rules. Ideally, the rules in combination with the dictionaries should simulate the language behaviour of humans in text production and interpretation. Integrated into individual modules, they are used as sophisticated applications. In particular, they underlie the operation of a number of NLP systems, such as English-to-Russian and Russian-to-English machine translation, generation of Russian texts from the semantic representation of an utterance in UNL, paraphrasing sentences in the given natural language, and some others. Apart from being an instrument of handling practical tasks in NLP, such systems serve as an experimental testing ground, which enables the researchers to update linguistic descriptions and obtain new linguistic knowledge from the experimental data. In the last few years, LCL has focused on the following four directions of research: development of theoretical problems of lexicography, creation of a deeply annotated corpus of the Russian Language, development of the UNL, the universal semantic language, and the extension and improvement of the functional potential of ETAP. SYNTAGRUS, the deeply annotated corpus of Russian texts, now consists of 30,000 sentences (500,000 words) with full morphological and syntactic tagging and is constantly growing. Syntactic annotation is represented by a sophisticated dependency tree for each sentence, whose nodes correspond to every word of the sentence and are linked with elaborate syntactic relations. In future, semantic annotation of the corpus is envisaged.
Dr. Leonid Iomdin, acting head of the Laboratory of Computational Linguistics represents ITPI-RAS in the project.