user1720510
user1720510

Reputation: 19

String Grouping Based On Name

I am trying to learn some python coding on my own and I came through this problem::

Input Text File Contents::

10280341|2012-10-03 19:11:06.390|Sami|abc|Crossword|70
10280343|2012-10-03 19:15:32.173|Sami|aaa|Sudoku|30
10280355|2012-10-04 19:18:32.173|miami|bbb|Chaircar|15
10280366|2012-10-04 19:19:32.173|miami|bob|Avista|35

Output Expected::

2012-10-03 Sami|2|100
2012-10-04 miami|2|50

I know this can be done through String Parsing & Matching but I do not have any idea, where to start. Any links or pointers would be highly helpful to any similar problem. TIA

Upvotes: 1

Views: 87

Answers (2)

Jon Clements
Jon Clements

Reputation: 142106

You could use itertools.groupby as has already been suggested, or make use of the csv.reader object which is already a generator and a collections.defaultdict to aggregate the value column...

import csv
from collections import defaultdict

summary = defaultdict(list)
csvin = csv.reader(open('testdata.txt'), delimiter='|')
for row in csvin:
    summary[(row[1].split(' ')[0], row[2])].append(int(row[5]))

csvout = csv.writer(open('testdata.out','wb'), delimiter='|')
for who, what in summary.iteritems():
    csvout.writerow( [' '.join(who), len(what), sum(what)] )

If you're looking at more complicated cross tabulation/pivoting etc..., then it may well be worth having a look at pandas which is a very useful library based on numpy

Upvotes: 1

Ignacio Vazquez-Abrams
Ignacio Vazquez-Abrams

Reputation: 798516

Use csv to read the file. Use itertools.groupby() to group the rows after sorting them. Use sum() to sum up each of the values in the grouped rows, via a generator expression.

Upvotes: 1

Related Questions