ARABIC LEXICON

In most languages, common nouns, adjectives and verbs can take very various forms in sentences, depending on the grammatical rules of the language. This is especially true in the Arabic language, where a single root of three consonants can generate hundreds of different forms.

While traditional dictionaries cover only a small fraction of the whole range of forms found in texts, our technology has been used to generated a database of 65 000 entries with their 6 millions of forms, covering more than 98 % of the forms found in any sort of text (literature, newspaper articles etc.), the remaining 2% including proper names.

Arabic Lexicon interfaces with Unitex, which is an open source corpus processing system for language processing, developed by Gaspard Monge Laboratory (LIGM UPEM).

Unitex Arabic has been presented to prestigious organizations, like Al-Ghazali Institute of La Grande Mosquée de Paris and L’Institut du Monde Arabe. It now can be used in a wide range of domains, like text editors, digitalization of printed documents, data mining in Arabic web contents and e-learning of Arabic.

Applications

Orthographic correction
Automatic typing word completion
E-reputation analysis on web sites
E-learning of the Arabic language
Digitalization of documents

Competitive advantages

Accuracy
Exhaustivity
Responsiveness

Intellectual property

Keywords

Semitic languages - Arabic - Orthography - Grammar - Unitex

Download the technology sheet