koRpus--tokenize command on large folder of word files

Question

I have made some headway in getting koRpus to analyze my data, but there are lingering problems.

The 'tokenize' command seems to work--kind of. I run the following line of code:

word <- tokenize("/Users/gdballingrud/Desktop/WPSCASES 1/", lang="en")

And it produces a 'Large krp.text' object. However, the size of the file (5.6 MB) is far less than the size of the file I reference in the code (260 MB). Further, when I use the 'readability' command to generate text analysis scores (like so:)

all <- readability(word)

It returns one readability score for the whole krp.text object (one per readability measure, I mean).

I need readability scores on each Word file I have in my folder, and I need to use koRpus (others like quanteda don't generate some of the readability measures that I need, like LIX and kuntzsch's text-redundandz-index).

Is anyone experienced enough with koRpus to point out what I have done wrong? The recurring problems are: 1) getting the tokenize command to recognize each file in my folder, and 2) getting readability scores for each separate file.

Thanks, Gordon

koRpus--tokenize command on large folder of word files

Answers (0)

Related Questions