Count frequency of words under given index in a file

Question

I am trying to count occurrence of words under specific index in my file and print it out as a dictionary.

def count_by_fruit(file_name="file_with_fruit_data.txt"):
    with open(file_name, "r") as file:
        content_of_file = file.readlines()
        dict_of_fruit_count = {}
        for line in content_of_file:
            line = line[0:-1]
            line = line.split("	")
            for fruit in line:
                fruit = line[1]
                dict_of_fruit_count[fruit] = dict_of_fruit_count.get(fruit, 0) + 1
    return dict_of_fruit_count


print(count_by_fruit())

Output: {'apple': 6, 'banana': 6, 'orange': 3}

I am getting this output, however, it doesn't count frequency of the words correctly. After searching around I didn't seem to find the proper solution. Could anyone help me to identify my mistake?

My file has the following content: (data separated with tabs, put " " in example as format is being altered by stackoverflow)

I am line one with apple from 2018
I am line two with orange from 2017
I am line three with apple from 2016
I am line four with banana from 2010
I am line five with banana from 1999

dawg · Accepted Answer

You are looping too many times over the same line. Notice that the results you are getting are all 3 times what you are expecting.

Also, in Python, you also do not need to read the entire file. Just iterate over the file object line by line.

Try:

def count_by_fruit(file_name="file_with_fruit_data.txt"):
    with open(file_name, "r") as f_in:
        dict_of_fruit_count = {}
        for line in f_in:
            fruit=line.split("	")[1]
            dict_of_fruit_count[fruit] = dict_of_fruit_count.get(fruit, 0) + 1
    return dict_of_fruit_count

Which can be further simplified to:

def count_by_fruit(file_name="file_with_fruit_data.txt"):
    with open(file_name) as f_in:
        dict_of_fruit_count = {}
        for fruit in (line.split('	')[1] for line in f_in):
            dict_of_fruit_count[fruit] = dict_of_fruit_count.get(fruit, 0) + 1
        return dict_of_fruit_count

Or, if you can use Counter:

from collections import Counter 

def count_by_fruit(file_name="file_with_fruit_data.txt"):
    with open(file_name) as f_in:
        return dict(Counter(line.split('	')[1] for line in f_in))

Count frequency of words under given index in a file

Answers (2)

Related Questions