Cryssie
Cryssie

Reputation: 3175

Splitting a sentence into two and storing them into a defaultdict as key and value in Python

I have some questions about Defaultdict and Counter. I have a situation where I have a text file with one sentence per line. I want to split up the sentence into two (at first space) and store them into a dictionary with the first substring as the key and the second substring as the value. The reason for doing this is so that I can get a total number of sentences that share the same key.

Text file format:
d1 This is an example
id3 Hello World
id1 This is also an example
id4 Hello Hello World
.
.

This is what I have tried but it doesn't work. I have looked at Counter but it's a bit tricky in my situation.

try:
    openFileObject = open('test.txt', "r")
    try:             

        with openFileObject as infile:
            for line in infile:

                #Break up line into two strings at first space                    
                tempLine = line.split(' ' , 1)

                classDict = defaultdict(tempLine)         
                for tempLine[0], tempLine[1] in tempLine: 
                    classDict[tempLine[0]].append(tempLine[1]) 

            #Get the total number of keys  
            len(classDict)

            #Get value for key id1 (should return 2) 

    finally:
        print 'Done.'
        openFileObject.close()
except IOError:
    pass

Is there a way to do this without splitting up the sentences and storing them as tuples in a huge list before attempting using Counter or defaultdict? Thanks!

EDIT: Thanks to all who answered. I finally found out where I went wrong in this. I edited the program with all the suggestions given by everyone.

openFileObject = open(filename, "r")           
tempList = []

with openFileObject as infile:
    for line in infile:

        tempLine = line.split(' ' , 1)
        tempList.append(tempLine) 

        classDict = defaultdict(list) #My error is here where I used tempLine instead if list
        for key, value in tempList: 
            classDict[key].append(value)   

            print len(classDict) 
            print len(classDict['key'])   

Upvotes: 3

Views: 325

Answers (3)

John La Rooy
John La Rooy

Reputation: 304335

Using collections.Counter to "get a total number of sentences that share the same key."

from collections import Counter
with openFileObject as infile:
    print Counter(x.split()[0] for x in infile)

will print

Counter({'id1': 2, 'id4': 1, 'id3': 1})

If you want to store a list of all the lines, your main mistake is here

classDict = defaultdict(tempLine)

For this pattern, you should be using

classDict = defaultdict(list)

But there's no point storing all those lines in a list if you're just indenting on taking the length.

Upvotes: 2

Alex L
Alex L

Reputation: 8925

Full example of defaultdict (and improved way of displaying classDict)

from collections import defaultdict

classDict = defaultdict(int)

with open('text.txt') as f:
    for line in f:
        first_word = line.split()[0]
        classDict[first_word] += 1

    print(len(classDict))
    for key, value in classDict.iteritems():
        print('{}: {}'.format(key, value))

Upvotes: 1

falsetru
falsetru

Reputation: 369224

dict.get(key, 0) return current accumulated count. If key was not in dict, return 0.

classDict = {}

with open('text.txt') as infile:
    for line in infile:
        key = line.split(' ' , 1)[0]
        classDict[key] = classDict.get(key, 0) + 1

    print(len(classDict))
    for key in classDict:
        print('{}: {}'.format(key, classDict[key]))

http://docs.python.org/3/library/stdtypes.html#dict.get

Upvotes: 1

Related Questions