runningbirds
runningbirds

Reputation: 6605

POS tagging in nltk

Hi is there an efficient way for tagging parts of speech in very large files?

 import pandas as pd
 import collections 
 import nltk 

 tokens=nltk.word_tokenize(pandas_dataframe)
 tag1=nltk.pos_tag(tokens)
 counts=collections.counter([y for x,y  in tag1])

I am trying to find the most common parts of speech in a file and don't know of a better way of doing this

Upvotes: 0

Views: 286

Answers (1)

leavesof3
leavesof3

Reputation: 431

Typically you need to get around the for loop, possible high memory load and possible high CPU load.

Here's an example of distributed part of speech tagging using python and execnet.

Upvotes: 1

Related Questions