Ivan
Ivan

Reputation: 347

File Parsing problem

I have a question about my homework problem. Here is the problem: Write a program which reads a text file called input.txt which contains an arbitrary number of lines of the form ", " then records this information using a dictionary, and finally outputs to the screen a list of countries represented in the file and the number of cities contained.

For example, if input.txt contained the following:

New York, US
Angers, France
Los Angeles, US
Pau, France
Dunkerque, France
Mecca, Saudi Arabia

The program would output the following (in some order):

Saudi Arabia : 1
US : 2
France : 3

Here is my Code:

def addword(w,wcDict):
    if w in wcDict:
        wcDict[w] +=1
    else:
        wcDict[w]= 1

import string
def processLine(line, wcDict):
    wordlist= line.strip().split(",")
    for word in wordlist:
        word= word.lower().strip()
        word=word.strip(string.punctuation)
        addword(wordlist[1], wcDict)

def prettyprint(wcDict):
    valkeylist= [(val,key) for key,val in wcDict.items()]
    valkeylist.sort(reverse = True)
    for val,key in valkeylist:
        print '%-12s    %3d'%(key,val)

def main():
    wcDict={}
    fobj= open('prob1.txt','r')
    for line in fobj:
        processLine(line, wcDict)
    prettyprint (wcDict)

 main()

My code counts each country twice. Can you please help me?

Thank you

Upvotes: 4

Views: 177

Answers (2)

Yayati Sule
Yayati Sule

Reputation: 1631

from collections import Counter as c
lines = (line.strip() for line in open("file.txt"))
data = (elem for elem in lines)
result = [two for one in data for two in one.split(",")]
c = Counter()
c(result)

I hope i answered your query

Upvotes: 0

Matt Bridges
Matt Bridges

Reputation: 49395

In the processLine function, you have an extraneous for loop. wordlist will always contain two entries, the city and the country. So the code inside your for loop (including addword) will be executed twice -- you can just delete the for statement entirely and it should work as you expect.

Upvotes: 2

Related Questions