Manohar Swamynathan
Manohar Swamynathan

Reputation: 2095

R Error with openNLPmodels.pt for Portuguese

I'm running into error while trying to work with openNLPmodels.pt for POS tagging the Portuguese sentence. However the model for English i.e., openNLPmodels.en works fine with English sentences.

Appreciate any help.

R Code

# R Code #
install.packages("openNLPmodels.pt", repos = "http://datacube.wu.ac.at/", type="source") 

library(openNLP)
library(NLP)
library(openNLPmodels.pt)

s <- paste("Um esquilo preto raro se tornou um visitante regular de um jardim suburbano.")

# For reference here is the English version of sentence #
# s <- paste("A rare black squirrel has become a regular visitor to a suburban garden.")
###

## Sentence token annotations.
sent_token_annotator <- Maxent_Sent_Token_Annotator(language = "pt", probs = FALSE, model ='openNLPmodels.pt')

# Code End #

Error

# Error #
Error in .jnew("java.io.FileInputStream", model) : 
  java.io.FileNotFoundException: openNLPmodels.pt (The system cannot find the file specified)

Upvotes: 0

Views: 1030

Answers (2)

Paula Fortuna
Paula Fortuna

Reputation: 11

I found a stupid solution, although working:

1) download the portuguese model from here (i took the pt-pos-maxent.bin): http://opennlp.sourceforge.net/models-1.5/

2) and then replace the English model in the path: depends-on-your-computer\R\win-library\3.4\openNLPdata\models\en-pos-maxent.bin

After these steps, then running the command:

word_token_annotator <- Maxent_Word_Token_Annotator(language = "en")

as specified here: How to use OpenNLP to get POS tags in R?

Then the classifiers performs bad for english and good for portuguese ;)

Upvotes: 1

Rodrigo Araujo
Rodrigo Araujo

Reputation: 41

I've tried modifying the code you sent and the problem seems to be "model ='openNLPmodels.pt'

If you set it to 'model = NULL', it might work, as in

sent_token_annotator <- Maxent_Sent_Token_Annotator(language = "pt", probs = FALSE, model =NULL)

When you use the option "NULL" in model, the default options for the language you select are used, so it should be ok.

Note, though, that the second and the third arguments you used are the default options, so you could simply omit them.

The problem I'm having is with the Parse_Annotator command, but that's another issue I'm posting here soon.

Upvotes: 0

Related Questions