Reputation: 45
I have a .csv file with a column of messages I have collected, I wish to get a word frequency list of every word in that column. Here is what I have so far and I am not sure where I have made a mistake, any help would be appreciated. Edit: The expected output is to write the entire list of words and their count (without duplicates) out to another .csv file.
import csv
from collections import Counter
from collections import defaultdict
output_file = 'comments_word_freqency.csv'
input_stream = open('comments.csv')
reader = csv.reader(input_stream, delimiter=',')
reader.next() #skip header
csvrow = [row[3] for row in reader] #Get the fourth column only
with open(output_file, 'rb') as csvfile:
for row in reader:
freq_dict = defaultdict(int) # the "int" part
# means that the VALUES of the dictionary are integers.
for line in csvrow:
words = line.split(" ")
for word in words:
word = word.lower() # ignores case type
freq_dict[word] += 1
writer = csv.writer(open(output_file, "wb+")) # this is what lets you write the csv file.
for key, value in freq_dict.items():
# this iterates through your dictionary and writes each pair as its own line.
writer.writerow([key, value])
Upvotes: 1
Views: 16233
Reputation: 769
Recently I run the code proposed by SAMO. I was facing some issues with Python3.6. Hence, I am posting a working code [changed few lines from SAMO's code], which may help others and save their times.
import csv
from collections import Counter
from collections import defaultdict
words= []
with open('data.csv', 'rt') as csvfile:
reader = csv.reader(csvfile)
next(reader)
for col in reader:
csv_words = col[0].split(" ")
for word in csv_words:
words.append(word)
with open('frequency_result.csv', 'a+') as csvfile:
writer = csv.writer(csvfile, delimiter=',')
for word in words:
word_count = words.count(word)
words_counted.append((word,word_count))
writer.writerow(words_counted)
Upvotes: 2
Reputation: 456
The code you uploaded is all over the place, but I think this is what you're getting at. This returns a list of the word and the number of times it appeared in the original file.
words= []
with open('comments_word_freqency.csv', 'rb') as csvfile:
reader = csv.reader(csvfile)
reader.next()
for row in reader:
csv_words = row[3].split(" ")
for i in csv_words:
words.append(i)
words_counted = []
for i in words:
x = words.count(i)
words_counted.append((i,x))
#write this to csv file
with open('output.csv', 'wb') as f:
writer = csv.writer(f)
writer.writerows(edgl)
Then to get rid of the duplicates in the list just call set() on it
set(words_counted)
Your output will look like this:
'this', 2
'is', 1
'your', 3
'output', 5
Upvotes: 0