Analyzing and reproducing natural language requires an understanding of the meaning of the sentence. To meet this need, the corpus made up of more than 20,000 richly annotated sentences in French constitutes a lexical and syntactic resource of reference for linguists and computer scientists, in particular in the case of use in automatic natural language processing.
Applications
- Automatic natural language processing
- Semantic web, search engine
- Human-machine dialogue (chatbots)
- Spellchecking
- Automatic translation
- Language teaching
Competitive advantages
- Quality of the corpus: annotation by automatic tools and corrected by hand by several successive passages on the different annotations
- Available in four formats: xml (original format), Tiger-xml (the most complete format with compound components), PTB (constituent annotations), CoNLL (dependencies annotations)
- Rich annotation : domain, author, date; compound words (and components), 218 morpho-syntactic labels, grammatical functions and trees of syntactic constituents
Intellectual property
Corpus filing on 01/28/2018 with APP, n°IDDN FR 001 050008 000 D C 2008 000 10300
Keywords
Lexical ressources - Syntatic ressources - NLP