Change tokenizer when loading Dependency Parsing model from AllenNLP

Question

I am using a pretrained dependency parsing model from AllenNLP, namely this one.

I have the sentence How do I find work-life balance?, and when extracting the dependency graph, the tokenizer used by the AllenNLP model splits the sentence as ['How', 'do', 'I', 'find', 'work', '-', 'life', 'balance', '?']. However, I would prefer to split the sentence as ['How', 'do', 'I', 'find', 'work-life', 'balance', '?'] (notice work-life as a single word) as given by the function word_tokenize from NLTK.

Is there a way to change the tokenizer used by the pretrained model? Was the model trained using a tokenizer that always splits the hyphenated words? I cannot find the answers in the official documentation. Thanks in advance for any help you can provide.

Change tokenizer when loading Dependency Parsing model from AllenNLP

Answers (1)

Related Questions