FRENCH TREEBANK

Analyzing and reproducing natural language requires an understanding of the meaning of the sentence. To meet this need, the corpus made up of more than 20,000 richly annotated sentences in French constitutes a lexical and syntactic resource of reference for linguists and computer scientists, in particular in the case of use in automatic natural language processing.

Applications

Automatic natural language processing
Semantic web, search engine
Human-machine dialogue (chatbots)
Spellchecking
Automatic translation
Language teaching

Competitive advantages

Quality of the corpus: annotation by automatic tools and corrected by hand by several successive passages on the different annotations
Available in four formats: xml (original format), Tiger-xml (the most complete format with compound components), PTB (constituent annotations), CoNLL (dependencies annotations)
Rich annotation : domain, author, date; compound words (and components), 218 morpho-syntactic labels, grammatical functions and trees of syntactic constituents

Intellectual property

Corpus filing on 01/28/2018 with APP, n°IDDN FR 001 050008 000 D C 2008 000 10300

Keywords

Lexical ressources - Syntatic ressources - NLP

Download the technology sheet