Marc G.
Marc G.

Reputation: 141

feed treetagger in R with text in string rather than text in file

I use TreeTagger from R, through the Korpus package. Calling the treetag function requires me to indicate a filename, which contains the text to be processed. However, I would like to provide a string rather than a filename, because I have a do some preliminary text processing on this string. I guess this has to go through a file because it is wrapping a script call. As I am looping over 10000 texts I would like to avoid writing the file to disk and waste time, but just flow through memory. Can I avoid this ? Thanks.

Upvotes: 1

Views: 419

Answers (2)

vivsingh
vivsingh

Reputation: 41

As indicated in the documentation:

format: Either "file" or "obj", depending on whether you want to scan files or analyze the text in a given object, like a character vector. If the latter, it will be written to a temporary file (see file).

Using this knowledge, we can simply use the treetag()-function in combination with a character vector:

treetag(as.vector(yourinput), format = "obj"). 

Internally R converts it to a text file and Treetagger will refer to that temporary file and analyze it.

Upvotes: 1

Spacedman
Spacedman

Reputation: 94222

No. Or not really. As you suspect, the external script needs a file:

read the docs:

Either a connection or a character vector, valid path to a file, containing the text to be analyzed. If file is a connection, its contents will be written to a temporary file, since TreeTagger can't read from R connection objects.

So its got to write it to a file for the external TreeTagger binary to read. If you don't do that, then the treetag function does it for you. Either way, the text ends up in file.

If TreeTagger can read from a Unix named pipe, or fifo, then you might be able to stream text to it on the fly.

The only other option would be to see if the TreeTagger source can be linked with R in some way so that you can call one of its subroutines directly, passing an R object. I don't even know if this is written in Java or C++ or whatever, but it might be a big job anyway.

Upvotes: 1

Related Questions