Reputation: 93
I have a data file such as:
1 123 something else
2 234 something else
3 500 something else
.
.
.
1 891 something else
2 234 something else
3 567 something else
.
.
.
I am trying to end up with a file with:
1 1014
2 468
3 1067
That is, add the numbers in column 2 (or some other column) if the number in column 1 is the same. I believe reading the columns into a nested list and proceeding from there is the way to go but I have been struggling with that. Another approach I tried was creating a new file with the entries I am interested in:
for next in f.readlines():
output.write(next[0:1] + "," + next[3:6]+ "\n")
if not next:
break
with open(output,"r") as file:
data_list=[[int(x) for x in line.split(",")] for line in file]
print data_list
This returns
[[1, 123], [2, 234], [3, 500], [1, 891], [2, 234], [3, 567]]
I guess I could loop through that list and compare data_list[x][0] and add the values if they match but that does not seem like an elegant solution. Could anyone suggest a more elegant way of doing this? Especially, I have been struggling with summation of specific items in the nested list I end up with.
Upvotes: 2
Views: 162
Reputation: 1121914
Use a dictionary to track the sum; using a collections.defaultdict()
makes it a little easier to start keys at 0 if they haven't been seen before:
from collections import defaultdict
sums = defaultdict(int)
with open(filename) as f:
for line in f:
col1, col2, rest = line.split(None, 2)
sums[col1] += int(col2)
This reads your initial file, splits the line on whitespace 2 times to get the first two columns out, then sums the second column based on the first:
>>> from collections import defaultdict
>>> sample = '''\
... 1 123 something else
... 2 234 something else
... 3 500 something else
... 1 891 something else
... 2 234 something else
... 3 567 something else
... '''.splitlines()
>>> sums = defaultdict(int)
>>> for line in sample:
... col1, col2, rest = line.split(None, 2)
... sums[col1] += int(col2)
...
>>> sums
defaultdict(<type 'int'>, {'1': 1014, '3': 1067, '2': 468})
Upvotes: 4