Reputation: 498
I'm designing an Android application that would rely heavily on natural language processing for its purposes. I selected OpenNLP since it seems to offer what I need to offer, made a few classes to encapsulate tokenization, pos tagging, etc, and tested them out in a standard java setting with no issues.
My problem seems to be with the Android File system. OpenNLP calls for a training file to initialize the data model behind each class. However, the constructors for these classes seem to take in a very specific InputStream
, as when I manage to successfully reference these files, I either get an error about the access permissions (I've added permissions for reading and writing from/to external storage), or an error stating that "The profile data stream has an invalid format!"
I'm at a loss, as using the standard input stream methods provided by the Android context class doesn't work as the provided input streams are of an invalid format, and attempting to manually access the files using my own input streams brings up permission problems. I've even tried loading the files at run time from the res folder into another file, and then re loading it using a normal FileInputStream
, but this once again brings me to the invalid format problem.
Below is the method used to access the files, and an example method for initializing one of the models (they're all fairly uniform). If anybody has an idea what's going on, or if anybody has gotten OpenNLP to work in the Android environment, a little help would be greatly appreciated!
File Access Method:
protected FileInputStream importIfNotExists(){
FileInputStream input = null;
if(mContext != null){
File file = new File(getDirectory(), getFilePath());
if(file.exists()){ //Create input stream from file.
try {
Log.d("Analysis Tool", "Accessing file");
//Crashes here if it exists
input = new FileInputStream(file);
}
catch (FileNotFoundException e) {
Log.d("Speech Analysis Tool", "File not found: " + getFilePath());
input = null;
}
}
else{ //Import resource file, then get input stream
InputStream stream = null;
ByteArrayOutputStream bytes = new ByteArrayOutputStream();
int sample = 0;
try {
Log.d("Analysis Tool", "Loading raw resource");
stream = mContext.getResources().openRawResource(mResId);
Log.d("Analysis Tool", "Creating file to be written to.");
file.createNewFile();
Log.d("Analysis Tool", "Reading bytes from resource.");
sample = stream.read();
while(sample != -1){
bytes.write(sample);
sample = stream.read();
}
stream.close();
Log.d("Analysis Tool", "Creating file: " + getFilePath());
FileOutputStream output = new FileOutputStream(file, false);
Log.d("Analysis Tool", "Writing bytes to " + getFilePath());
bytes.writeTo(output);
bytes.close();
output.close();
Log.d("Analysis Tool", "Retrieving input stream for new file");
input = new FileInputStream(file);
//the input passed from this is typically of an invalid format
}
catch (IOException e) {
Log.d("Speech Analysis Tool", "IOException with: " + getFilePath());
Log.e("Speech Analysis Tool", e.getLocalizedMessage());
input = null;
}
}
}
return input;
}
Model Initialization:
@Override
protected void initializeTool(FileInputStream input) throws InvalidFormatException, IOException{
if(input == null){
Log.e("Speech Tokenizer", "Input stream for tokenizer is null");
return;
}
TokenizerModel model = getModel(input);
mTokenizer = new TokenizerME(model);
}
getFilePath()
simply returns the filename and its file type (like en_token.bin), and getDirectory()
has varied with little to no success, but is intended to be the directory on external storage where I'd either access these files, or load them in at run time.
Upvotes: 3
Views: 1071
Reputation: 11
Add this line to your code:
System.setProperty("org.xml.sax.driver","org.xmlpull.v1.sax2.Driver");
Helped me, maybe it'll help you
Upvotes: 1