Word Frequency from a CSV Column in Python

Question

I have a .csv file with a column of messages I have collected, I wish to get a word frequency list of every word in that column. Here is what I have so far and I am not sure where I have made a mistake, any help would be appreciated. Edit: The expected output is to write the entire list of words and their count (without duplicates) out to another .csv file.

import csv
from collections import Counter
from collections import defaultdict

output_file = 'comments_word_freqency.csv'
input_stream = open('comments.csv')
reader = csv.reader(input_stream, delimiter=',')
reader.next() #skip header
csvrow = [row[3] for row in reader] #Get the fourth column only

with open(output_file, 'rb') as csvfile:
    for row in reader:
        freq_dict = defaultdict(int) # the "int" part
                                    # means that the VALUES of the dictionary are integers.
        for line in csvrow:
            words = line.split(" ")
            for word in words:
                word = word.lower() # ignores case type
                freq_dict[word] += 1

        writer = csv.writer(open(output_file, "wb+")) # this is what lets you write the csv file.
        for key, value in freq_dict.items():
                        # this iterates through your dictionary and writes each pair as its own line.
            writer.writerow([key, value])

SAMO · Accepted Answer

The code you uploaded is all over the place, but I think this is what you're getting at. This returns a list of the word and the number of times it appeared in the original file.

words= []
with open('comments_word_freqency.csv', 'rb') as csvfile:
    reader = csv.reader(csvfile)
    reader.next()
    for row in reader:
         csv_words = row[3].split(" ")
         for i in csv_words:
              words.append(i)

words_counted = []
for i in words:
    x = words.count(i)
    words_counted.append((i,x))

#write this to csv file
with open('output.csv', 'wb') as f:
writer = csv.writer(f)
writer.writerows(edgl)

Then to get rid of the duplicates in the list just call set() on it

set(words_counted)

Your output will look like this:

'this', 2
'is', 1
'your', 3
'output', 5

Word Frequency from a CSV Column in Python

Answers (2)

Related Questions