Reputation: 244
I tried to count the number of occurrence of the word "the" in a .csv file, but when I run the following code, it returns 0. (test.csv is located here)
I just search the first column of this file.
import csv
import nltk
tweet = []
for t in csv.DictReader(open('test.csv'), delimiter=','):
tweet.append(t['text'])
tweet_text = nltk.Text(tweet)
print tweet_text.count("the")
Thanks in advance for your help.
Upvotes: 3
Views: 2008
Reputation: 368894
Split text
field into words using str.split
, and use list.extend
accordingly. And make lowercase unless you only want lowercase the
.
>>> nltk.Text(['the world The words']).count('the')
0
>>> nltk.Text(['the', 'world', 'The', 'words']).count('the')
1
Complete code:
import csv
import nltk
tweet = []
for t in csv.DictReader(open('test.csv'), delimiter=','):
tweet.extend(t['text'].lower().split()) # <-----------
tweet_text = nltk.Text(tweet)
print tweet_text.count('the')
Upvotes: 2