Reputation: 93
I want to get constituency trees for french documents. I've tried to install several tools but all of those I found are quite old and I didn't succeed.
Benepar
: it looks very interesting but doesn't seem compatible with python > 3.9, and requires old torch version (see https://github.com/grimavatar/benepar/blob/master/setup.py). Otherwise I'd like to test the CTL library which seems nice (https://stanfordnlp.github.io/CoreNLP/parser-standalone.html). Also tried SuPar
, which could be an alternative, but it's also old and couldn't get it working (https://github.com/yzhangcs/parser).
Stanford CoreNLP
/ Stanza
: the more recent version for the Stanza french model doesn't implement the constituency (https://stanfordnlp.github.io/stanza/constituency.html) ; I didn'find another model on HF. So I'm trying now to use the CoreNLP standalone parser, which has a french model (https://stanfordnlp.github.io/CoreNLP/parser-standalone.html) available as a .jar
file : stanford-corenlp-4.2.1-models-french.jar
.
If there is no fr model that provides constituency, is it possible to use Stanza with the .jar model I found ?
Could s.o provide a command to use with the .jar
model ? In the docs there is this example which requires a .gz
model (probably installed with coreNLP ?) java -Xmx2g -cp "*" edu.stanford.nlp.parser.nndep.DependencyParser \ -model edu/stanford/nlp/models/parser/nndep/UD_French.gz \ -tagger.model edu/stanford/nlp/models/pos-tagger/french-ud.tagger \ -tokenized -textFile example.txt -outFile example.txt.out
EDIT : the above command is working (I only have to place the jar file in the working directory) but not providing a constituency tree. This is explained in the readme of the parser :
The only provided French constituency parser is a shift-reduce parser. At this time running the shift-reduce parser on French text requires running a pipeline with the full Stanford CoreNLP package.
I have managed to obtain a tree by using the whole package CoreNLP and this command :
java -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLP -props StanfordCoreNLP-french.properties -annotators tokenize,ssplit,pos,parse -file example.txt -outputFormat text
(with the .jar file in the directory).
(doc : https://stanfordnlp.github.io/CoreNLP/parse.html)
Now it would be great to be able to integrate the model with python Stanza...
(n.b. : found also these questions very useful : Benepar for syntactic segmentation ; Stanford NLP : Constituency parser in French)
Many thanks in advance !
Upvotes: 0
Views: 45