Pymal
Pymal

Reputation: 244

count occurrence of a words in a csv file in python using nltk

I tried to count the number of occurrence of the word "the" in a .csv file, but when I run the following code, it returns 0. (test.csv is located here)

I just search the first column of this file.

import csv
import nltk

tweet = []

for t in csv.DictReader(open('test.csv'), delimiter=','):
    tweet.append(t['text'])

tweet_text = nltk.Text(tweet)
print tweet_text.count("the")

Thanks in advance for your help.

Upvotes: 3

Views: 2008

Answers (1)

falsetru
falsetru

Reputation: 368894

Split text field into words using str.split, and use list.extend accordingly. And make lowercase unless you only want lowercase the.

>>> nltk.Text(['the world The words']).count('the')
0
>>> nltk.Text(['the', 'world', 'The', 'words']).count('the')
1

Complete code:

import csv
import nltk

tweet = []

for t in csv.DictReader(open('test.csv'), delimiter=','):
    tweet.extend(t['text'].lower().split()) # <-----------

tweet_text = nltk.Text(tweet)
print tweet_text.count('the')

Upvotes: 2

Related Questions