Reputation: 1585
I have data from travel diaries which has been read in from a csv file. I have it set up as a dictionary with a bunch of lists. E.g.:
print diary['ID'][1] gives 123456789
print diary['TravelReferenceDay'][1] gives 1 for a Monday
I want to randomnly select an ID from the array based on the day e.g.:
random.choice(diary['ID']) if diary['TravelReferenceDay'] == 1
I can arrange the data by TravelReferenceDay in the csv file. I had tried the groupby method to split up the array:
groups = []
uniquekeys = []
for k, g in groupby(diary, diary['TravelReferenceDay']):
groups.append(list(g)) # Store group iterator as a list
uniquekeys.append(k)
But that gave the error:
TypeError: 'list' object is not callable
Could you suggest a way to achieve this? Thanks.
Upvotes: 0
Views: 372
Reputation: 12946
My solution with a list comprehensions:
In [1]: import random
...: diary = {'ID': ['11', '22', '33', '44', '55'], 'TravelReferenceDay': [1, 1, 2, 3, 1]}
...: monday_diary = [x for n, x in enumerate(diary['ID']) if diary['TravelReferenceDay'][n] == 1]
In [2]: monday_diary
Out[2]: ['11', '22', '55']
In [3]: random.choice(monday_diary)
Out[3]: '22'
Upvotes: 1
Reputation: 157344
The second argument to groupby
is a callable that is invoked on successive items from the iterable first argument.
You want to use operator.itemgetter('TravelReferenceDay')
:
for k, g in groupby(diary, operator.itemgetter('TravelReferenceDay')):
...
This is equivalent to lambda x: x['TravelReferenceDay']
.
Note that groupby
expects the iterable to already be sorted by the key; groups contain adjacent items with the same key.
OK, this won't work because you've stored your data as parallel arrays. For ease of processing I'd advise to convert it to a list of dicts:
diary = [dict((k, diary[k][i]) for k in diary) for i in range(len(diary['ID']))]
Upvotes: 2