Preston May
Preston May

Reputation: 15

Take certain words and print the frequency of each phrase/word?

I have a file that has a list of bands and the album and year it was produced. I need to write a function that will go through this file and find the different names of the bands and count how many times each of those bands appear in this file.

The way the file looks is like this:

Beatles - Revolver (1966)
Nirvana - Nevermind (1991)
Beatles - Sgt Pepper's Lonely Hearts Club Band (1967)
U2 - The Joshua Tree (1987)
Beatles - The Beatles (1968)
Beatles - Abbey Road (1969)
Guns N' Roses - Appetite For Destruction (1987)
Radiohead - Ok Computer (1997)
Led Zeppelin - Led Zeppelin 4 (1971)
U2 - Achtung Baby (1991)
Pink Floyd - Dark Side Of The Moon (1973)
Michael Jackson -Thriller (1982)
Rolling Stones - Exile On Main Street (1972)
Clash - London Calling (1979)
U2 - All That You Can't Leave Behind (2000)
Weezer - Pinkerton (1996)
Radiohead - The Bends (1995)
Smashing Pumpkins - Mellon Collie And The Infinite Sadness (1995)
.
.
.

The output has to be in descending order of frequency and look like this:

band1: number1
band2: number2
band3: number3

Here is the code I have so far:

def read_albums(filename) :

    file = open("albums.txt", "r")
    bands = {}
    for line in file :
        words = line.split()
        for word in words:
            if word in '-' :
                del(words[words.index(word):])
        string1 = ""
        for i in words :
            list1 = []

            string1 = string1 + i + " "
            list1.append(string1)
        for k in list1 :
            if (k in bands) :
                bands[k] = bands[k] +1
            else :
                bands[k] = 1


    for word in bands :
        frequency = bands[word]
        print(word + ":", len(bands))

I think there's an easier way to do this, but I'm not sure. Also, I'm not sure how to sort a dictionary by frequency, do I need to convert it to a list?

Upvotes: 0

Views: 175

Answers (3)

Burhan Khalid
Burhan Khalid

Reputation: 174708

You are right, there is an easier way, with Counter:

from collections import Counter

with open('bandfile.txt') as f:
   counts = Counter(line.split('-')[0].strip() for line in f if line)

for band, count in counts.most_common():
    print("{0}:{1}".format(band, count))

what exactly is this doing: line.split('-')[0].strip() for line in f if line?

This line is a long form of the following loop:

temp_list = []
for line in f:
    if line: # this makes sure to skip blank lines
      bits = line.split('-')
      temp_list.add(bits[0].strip())

counts = Counter(temp_list)

Unlike the loop above however - it doesn't create an intermediary list. Instead, it creates a generator expression - a more memory efficient way to step through things; which is used as an argument to Counter.

Upvotes: 2

caffreyd
caffreyd

Reputation: 1203

My approach is to use the split() method to break the file lines into a list of constituent tokens. Then you can grab the band name (first token in the list), and start adding the names to a dictionary to keep track of the counts:

import operator

def main():
  f = open("albums.txt", "rU")
  band_counts = {}

  #build a dictionary that adds each band as it is listed, then increments the count for re-lists
  for line in f:
    line_items = line.split("-") #break up the line into individual tokens
    band = line_items[0]

  #don't want to add newlines to the band list
  if band == "\n":
    continue

  if band in band_counts:
    band_counts[band] += 1 #band already in the counts, increment the counts
  else:
    band_counts[band] = 1  #if the band was not already in counts, add it with a count of 1

  #create a list of sorted results
  sorted_list = sorted(band_counts.iteritems(), key=operator.itemgetter(1))

  for item in sorted_list:
    print item[0], ":", item[1]

Notes:

  1. I followed the advice of this answer to create the sorted results: Sort a Python dictionary by value
  2. If you are new to Python, check out Google's Python class. I found it very helpful when I was just getting started: https://developers.google.com/edu/python/?csw=1

Upvotes: 0

thierrybm
thierrybm

Reputation: 129

If you're looking for conciseness, use a "defaultdict" and "sorted"

from collections import defaultdict
bands = defaultdict(int)
with open('tmp.txt') as f:
   for line in f.xreadlines():
       band = line.split(' - ')[0]
       bands[band] += 1
for band, count in sorted(bands.items(), key=lambda t: t[1], reverse=True):
    print '%s: %d' % (band, count)

Upvotes: 1

Related Questions