Ming
Ming

Reputation: 497

How to understand this python code?

This code is from the book Learning python and it is used to sum columns in a text file separated by commas. I really can't understand line 7, 8 &9. Thanks for the help. Here is the code:

filename='data.txt'
sums={}
for line in open(filename):
    cols=line.split(',')
    nums=[int(col) for col in cols]
    for(ix, num) in enumerate(nums):
        sums[ix]=sums.get(ix, 0)+num
for key in sorted(sums):
    print(key, '=', sums[key])

Upvotes: 0

Views: 115

Answers (2)

Hugh Bothwell
Hugh Bothwell

Reputation: 56624

It looks like the input file contains rows of comma-separated integers. This program prints out the sum of each column.

You've mixed up the indentation, which changes the meaning of the program, and it wasn't terribly nicely written to begin with. Here it is with lots of commenting:

filename='data.txt'    # name of the text file

sums = {}              # dictionary of { column: sum }
                       #   not initialized, because you don't know how many columns there are

# for each line in the input file,
for line in open(filename):
    # split the line by commas, resulting in a list of strings
    cols = line.split(',')
    # convert each string to an integer, resulting in a list of integers
    nums = [int(col) for col in cols]

    # Enumerating a list numbers the items - ie,
    #   enumerate([7,8,9]) -> [(0,7), (1,8), (2,9)]
    # It's used here to figure out which column each value gets added to
    for ix, num in enumerate(nums):
        # sums.get(index, defaultvalue) is the same as sums[index] IF sums already has a value for index
        # if not, sums[index] throws an error but .get returns defaultvalue
        # So this gets a running sum for the column if it exists, else 0;
        # then we add the new value and store it back to sums.
        sums[ix] = sums.get(ix, 0) + num

# Go through the sums in ascending order by column -
#   this is necessary because dictionaries have no inherent ordering
for key in sorted(sums):                    
    # and for each, print the column# and sum
    print(key, '=', sums[key])

I would write it a bit differently; something like

from collections import Counter
sums = Counter()

with open('data.txt') as inf:
    for line in inf:
        values = [int(v) for v in line.split(',')]
        sums.update(enumerate(values))

for col,sum in sorted(sums.iteritems()):
    print("{}: {}".format(col, sum))

Upvotes: 2

abarnert
abarnert

Reputation: 365597

Assuming you understand lines 1-6…

Line 7:

sums[ix]=sums.get(ix, 0)+num

sums.get(ix, 0) is the same as sums[ix], except that if ix not in sums it returns 0 instead. So, this is just like sums[ix] += num, except that it first sets the value to 0 if this is the first time you've seen ix.

So, it should be clear that by the end of this loop, sums[ix] is going to have the sum of all values in column ix.

This is a silly way to do this. As mgilson points out, you could just use defaultdict so you don't need that extra logic. Or, even more simply, you could just use a list instead of a dict, because this (indexing by sequential small numbers) is exactly what lists are for…

Line 8:

for key in sorted(sums):

First, you can iterate over any dict as if it were a list or other iterable, and it has the same effect as iterating over sums.keys(). So, if sums looks like { 0: 4, 1: 6, 2: 3 }, you're going to iterate over 0, 1, 2.

Except that dicts don't have any inherent order. You may get 0, 1, 2, or you may get 1, 0, 2, or any other order.

So, sorted(sums) just returns a copy of that list of keys in sorted order, guaranteeing that you'll get 0, 1, 2 in that order.

Again, this is silly, because if you just used a list in the first place, you'd get things in order.

Line 9:

print(key, '=', sums[key])

This one should be obvious. If key iterates over 0, 1, 2, then this is going to print 0 = 4, 1 = 6, 2 = 3.

So, in other words, it's printing out each column number, together with the sum of all values in that column.

Upvotes: 1

Related Questions