Reputation: 17904
I have a MultiIndex series look like this:
user_id cookie browser
1 1_1 [chrome45]
2 2_1 [IE 7]
2 2_2 [IE 7, IE 8]
There are two levels to this MultiIndex, user_id
and cookie
. The value is the browser.
What I want to do is to count the number of times a user uses a different browser.
So for user 1 in this case, he only used 1 browser. But for user 2, he used three browsers (IE7 appeared twice under different cookies, so I count it twice instead of once)
How can I loop through it and get a result like this:
r = defaultdict(int)
for user_id in multiIndex_series:
for cookie in multiIndex_series[user_id]:
r[user_id] += len(multiIndex_series[user_id][cookie]) # I don't know how to get user_id out of the MultiIndex series
Upvotes: 1
Views: 3543
Reputation: 862701
You can use groupby
with apply lambda function where get length
of flatten lists
- see answer for more info:
df = pd.DataFrame({'user_id':[1,2,2],
'cookie':['1_1','2_1','2_2'],
'browser':[['chrome45'],['IE 7'],['IE 7','IE 8']]})
df = df.set_index(['user_id','cookie'])
print (df)
browser
user_id cookie
1 1_1 [chrome45]
2 2_1 [IE 7]
2_2 [IE 7, IE 8]
from itertools import chain
print (df.groupby(level='user_id')['browser']
.apply(lambda x: len(list(chain.from_iterable(x)))))
user_id
1 1
2 3
Name: browser, dtype: int64
Instead lambda
is possible use custom function f
what is better way for testing:
def f(x):
print (list(chain.from_iterable(x)))
return len(list(chain.from_iterable(x)))
['chrome45']
['IE 7', 'IE 7', 'IE 8']
print (df.groupby(level='user_id')['browser'].apply(f))
user_id
1 1
2 3
Name: browser, dtype: int64
If need loop in series, one possible solution is:
for user_id, val in df['browser'].iteritems():
print (user_id)
print (val)
['chrome45']
(2, '2_1')
['IE 7']
(2, '2_2')
['IE 7', 'IE 8']
Upvotes: 2