Reputation: 6605
Hi is there an efficient way for tagging parts of speech in very large files?
import pandas as pd
import collections
import nltk
tokens=nltk.word_tokenize(pandas_dataframe)
tag1=nltk.pos_tag(tokens)
counts=collections.counter([y for x,y in tag1])
I am trying to find the most common parts of speech in a file and don't know of a better way of doing this
Upvotes: 0
Views: 286
Reputation: 431
Typically you need to get around the for loop, possible high memory load and possible high CPU load.
Here's an example of distributed part of speech tagging using python and execnet.
Upvotes: 1