osi
osi

Reputation: 13

Python script to find word frequencies of a given document

I am looking for a simple script that can find frequencies of words for a given document (probably by using portable stemmer).

Is there any library or simple script that does this process?

Upvotes: 0

Views: 1433

Answers (2)

Roshan Mathews
Roshan Mathews

Reputation: 5898

You should be able to count words. Use a collections.Counter or a dict, depending on what you need. That part is easy, but if it isn't you can find the answer by searching on SO itself.

I think you also want the Porter Stemmer, which has a Python version at http://tartarus.org/~martin/PorterStemmer/python.txt

Upvotes: 0

MattoTodd
MattoTodd

Reputation: 15199

use nltk

import nltk

YOUR_STRING = "Your words"

words = [w for w in YOUR_STRING.split()]
freq_dist = nltk.FreqDist(words)

tokens = freq_dist.keys()

#50 most frequent
most_frequent = tokens[:50]

#50 least frequent
least_frequent = tokens[-50:]

Upvotes: 2

Related Questions