Reputation: 19
I am trying to learn some python coding on my own and I came through this problem::
Input Text File Contents::
10280341|2012-10-03 19:11:06.390|Sami|abc|Crossword|70
10280343|2012-10-03 19:15:32.173|Sami|aaa|Sudoku|30
10280355|2012-10-04 19:18:32.173|miami|bbb|Chaircar|15
10280366|2012-10-04 19:19:32.173|miami|bob|Avista|35
Output Expected::
2012-10-03 Sami|2|100
2012-10-04 miami|2|50
I know this can be done through String Parsing & Matching but I do not have any idea, where to start. Any links or pointers would be highly helpful to any similar problem. TIA
Upvotes: 1
Views: 87
Reputation: 142106
You could use itertools.groupby
as has already been suggested, or make use of the csv.reader
object which is already a generator and a collections.defaultdict
to aggregate the value column...
import csv
from collections import defaultdict
summary = defaultdict(list)
csvin = csv.reader(open('testdata.txt'), delimiter='|')
for row in csvin:
summary[(row[1].split(' ')[0], row[2])].append(int(row[5]))
csvout = csv.writer(open('testdata.out','wb'), delimiter='|')
for who, what in summary.iteritems():
csvout.writerow( [' '.join(who), len(what), sum(what)] )
If you're looking at more complicated cross tabulation/pivoting etc..., then it may well be worth having a look at pandas which is a very useful library based on numpy
Upvotes: 1
Reputation: 798516
Use csv
to read the file. Use itertools.groupby()
to group the rows after sorting them. Use sum()
to sum up each of the values in the grouped rows, via a generator expression.
Upvotes: 1