Multilingual dictionary
In the following, the object-oriented model of ClassiX® for the IT-technical representation of a multilingual dictionary is presented. At first, an overview of the linguistic terms used and their interrelations is given, followed by a presentation of the data model developed from this.
Morpheme
The overview begins with the smallest meaningful unit of a language, the so-called morphemes. A morpheme can also be described as the smallest semantically interpretable element of a word, i.e. a word consists of one or more morphemes.
Morphemes are phonetically realised as phonetic sound sequences (in units of a sound system (phonetics)), phonologically as phoneme sequences (in units of the speech system (phonology)), in writing as grapheme sequences (in units of a writing system). These sequences of sounds, phonemes or graphemes represent the morpheme in certain environments; the sounds, phonemes and graphemes themselves do not carry any meaning of their own but, as building blocks of the morpheme, only have a meaning-differentiating function.
Words
A word or a combination of words is (only) the linguistic expression of a term or the direct designation of a term via its meaningful morpheme. While the term contains the mental idea of the object or fact it designates (the meaning), the designation is the linguistic sign that refers to the intended object or fact (the name). It consists of a word(one-word denomination), e.g. dog(Simplizium) or dog owner (Kompositum), or a group of words(multi-word denomination), e.g. Faraday cage.
On the one hand, meaning is the knowledge of the usual use of a word or expression within a language community and a given context. Meaning is also considered to be what someone understands on the basis of a sign or a linguistic expression.
A word can have one meaning (univok) or several (equivok). Univocity is the unambiguousness of the relationship between word (sign) and meaning. If there is no uniqueness, a word is ambiguous (equivok). This equivocality in a broader sense is the ambiguous relationship between sign and signified. In a narrower sense, equivok is only used to refer to words that have the same sound but different meanings.
Equivocality (ambiguity) in a broader sense occurs as homonymy (like-name), synonymy (like-sense) or analogy.
If a linguistic expression (word, sign) with the same name has several meanings (pre-ambiguous relationship), one speaks of a homonym (in the broader sense) or of an equivalent expression (in the narrower sense) (example: "bank" for financial institution or seating furniture). In some cases, a homonymy (eponym) is only used if the origin of the word is different (example: "Tau" for rope coming from the Low German and for damp precipitation from the Old High German tou). This homonymy in the narrower sense is then distinguished from polysemy (ambiguity of a word with identical word origin) (example: "horse" for animal and gymnastic apparatus).
Synonyms are different linguistic expressions with the same meaning (examples: "grandpa" = "grandfather"; "white horse" = "grey horse"). The synonymy is a post-unambiguous relationship between sign and designated. In addition to this term of synonymy in the narrower sense, synonymy (then in a broader sense) is also used in the case of words that are only related in meaning.
Strict synonymy requires not only that two lexical signs have the same denotative meaning, but also that they are interchangeable in all contexts and have the same effect in all contexts. Examples of strict synonymic pairs in German are, according to general opinion: orange - orange; match - match; couch - sofa; helicopter - helicopter.
A homoionym (also called partial synonym) is often confused with the widely used synonym. A homoionym would be, for example, "car" and "motor vehicle", as it is not interchangeable in every context:
The opposite of synonymy is antonymy (beautiful/ugly; cold/hot).
The analogy is a special case of ambiguity and "exists when the various designated objects are in a certain relationship of dependence on each other or have a certain equality of structures".
A homograph or homoglyph is a word from a group of words which all have the same spelling but different meanings and often a different pronunciation. (Reindeer as animal and as person). With the same pronunciation it is also a homophone.)
Terms
Terms are related to each other in various ways. The generic term of a term is called a hyperonym, the subordinate term a hyponym.(Vegetable is a hyperonym of beans, tomato is a hyponym of vegetable). There are no limits to the number of levels(an animal is a hyponym of a mammal, a dog is a hyponym of an animal, a basset is a hyponym of a dog).
The thesaurus standards DIN 1463-1 or the international equivalent ISO 2788 provide for the following types of relations and associated abbreviations:
Abbreviation and designation | |
---|---|
DIN 1463-1 | ISO 2788 |
BF - Used for | UF - Used for |
BS - Use Synonym | USE/SYN Use synonymous |
OB - generic term | BT - Broader term |
UB - Subconcept | NT - Narrower term |
VB - Related term | RT - Related term |
SB - top term | TT - Top term |
Languages
Language here is the verbal communication of people, consisting of words (of a vocabulary). Ferdinand de Saussure conceived language as a sign system and conceived the language sign as a compelling connection between the sound image (signifiant = the signifying) and the imagination (signifié = the signified), i.e. as something mental.
A grammar puts the words in relation to each other.
Words of different languages can be brought into a direct connection by their common meaning. The words "tree" and "árbol" are three different words with the same meaning.
A written language represents a language using a writing system.
Type systems
The main writing systems fall into the 4 categories: Logograms, syllable writing, alphabets, phonetic transcriptions:
- Logograms are pictograms, ideograms and other abstract symbols that are assigned to a morpheme (i.e. a meaning of their own) (Chinese characters)
- In a syllable writing, sound groups (syllables) are assigned to the graphemes. Examples for syllable writings are the Japanese Kana writings.
- An alphabet is a phonographic system: the signs stand for sounds which, when combined, produce words. In contrast to syllabic writing, the characters of the alphabet usually represent only one phoneme at a time.
- Phonetic transcription is a writing system designed to reproduce the pronunciation of sounds or phonetic chains. Phonetic transcription is mainly used for learning foreign languages. The best known example of phonetic transcription is the International Phonetic Alphabet (IPA), which is used in most dictionaries.
Some special features: In Chinese Kanji there are often different, alternative ways of representing one and the same ideogram. A sign can also be pronounced differently by means of the ON or KUN reading. In the Japanese language 3 writing systems are used simultaneously: the Kanji (Chinese characters) and the two syllable fonts Hiragana and Katakana.
The object-oriented model in ClassiX®
The abstraction of the above mentioned terms leads to the following modelling:
One of the characteristics of a language (a language system) is that it defines its graphic and phonetic transcription systems for the (spoken) words.
A word is first defined in its relation to a language and its relation to a graphic transcription chosen in that language. The word element contains the graphic representation of the spoken word. A word also has N relations to meanings, which, however, are further specified via property elements.
The grammar rules
By adding additional information to a word - such as part of speech (noun, verb, adjective, etc.), numerus (singular, plural, pluraletantum, etc.), case, genus, etc. - the words and all their characteristics are automatically output by means of fixed grammar rules for the individual languages: in the case of nouns, for example, with the specific article, the singular and plural form, as well as the declension.
The modules
The modules stored in ClassiX® within the AppsWarehouse® form the building blocks for a complete multilingual dictionary (the languages German, English, French, Spanish, Portuguese, Italian, Dutch, Swedish, Danish and Russian are already stored in the dictionary as standard):