Removing duplicates from a csv word frequency list

Question

Currently I am stumped with this word frequency list. I have almost got my end result, which is printing each word and its count, but I can't seem to get rid of the duplicates. If someone could help me with this last part I would be grateful.

This is what I have so far

import csv

input_file = input()
##contents of input_file are -- hello,cat,man,hey,dog,boy,Hello,man,cat,woman,dog,Cat,hey,boy

with open(input_file, 'r') as csvfile:
    csvfile = csv.reader(csvfile)
    
    count = 0
    
    for line in csvfile:
        for word in line:
            count = line.count(word)
            ##I am trying to print the words and count without any duplicates
            print(word, count)

Ralubrusto · Accepted Answer

You can use a dictionary, since it doesn't allow duplicate keys. Take a look

with open(input_file, 'r') as csvfile:
    csvfile = csv.reader(csvfile)
    
    my_words = dict()
    
    for line in csvfile:
        for word in line:
            try:
                # If it's duplicated, add one
                my_words[word] += 1
            except KeyError:
                # If it's the first occurrence, set as one
                my_words[word] = 1
     for word, count in my_words.items():   
         print(word, count)

Removing duplicates from a csv word frequency list

Answers (1)

Related Questions