constituency parser in french

Question

I want to get constituency trees for french documents. I've tried to install several tools but all of those I found are quite old and I didn't succeed.

Benepar : it looks very interesting but doesn't seem compatible with python > 3.9, and requires old torch version (see https://github.com/grimavatar/benepar/blob/master/setup.py). Otherwise I'd like to test the CTL library which seems nice (https://stanfordnlp.github.io/CoreNLP/parser-standalone.html). Also tried SuPar, which could be an alternative, but it's also old and couldn't get it working (https://github.com/yzhangcs/parser).
Stanford CoreNLP / Stanza : the more recent version for the Stanza french model doesn't implement the constituency (https://stanfordnlp.github.io/stanza/constituency.html) ; I didn'find another model on HF. So I'm trying now to use the CoreNLP standalone parser, which has a french model (https://stanfordnlp.github.io/CoreNLP/parser-standalone.html) available as a .jar file : stanford-corenlp-4.2.1-models-french.jar.

If there is no fr model that provides constituency, is it possible to use Stanza with the .jar model I found ?
Could s.o provide a command to use with the .jar model ? In the docs there is this example which requires a .gz model (probably installed with coreNLP ?) java -Xmx2g -cp "*" edu.stanford.nlp.parser.nndep.DependencyParser \ -model edu/stanford/nlp/models/parser/nndep/UD_French.gz \ -tagger.model edu/stanford/nlp/models/pos-tagger/french-ud.tagger \ -tokenized -textFile example.txt -outFile example.txt.out

EDIT : the above command is working (I only have to place the jar file in the working directory) but not providing a constituency tree. This is explained in the readme of the parser :

The only provided French constituency parser is a shift-reduce parser. At this time running the shift-reduce parser on French text requires running a pipeline with the full Stanford CoreNLP package.

I have managed to obtain a tree by using the whole package CoreNLP and this command :

java -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLP -props StanfordCoreNLP-french.properties -annotators tokenize,ssplit,pos,parse -file example.txt -outputFormat text (with the .jar file in the directory).

(doc : https://stanfordnlp.github.io/CoreNLP/parse.html)

Now it would be great to be able to integrate the model with python Stanza...

(n.b. : found also these questions very useful : Benepar for syntactic segmentation ; Stanford NLP : Constituency parser in French)

Many thanks in advance !

constituency parser in french

Answers (0)

Related Questions