Cheng
Cheng

Reputation: 17904

Pandas how to loop through a MultiIndex series

I have a MultiIndex series look like this:

user_id  cookie  browser
1        1_1     [chrome45]
2        2_1     [IE 7]
2        2_2     [IE 7, IE 8]

There are two levels to this MultiIndex, user_id and cookie. The value is the browser.

What I want to do is to count the number of times a user uses a different browser.

So for user 1 in this case, he only used 1 browser. But for user 2, he used three browsers (IE7 appeared twice under different cookies, so I count it twice instead of once)

How can I loop through it and get a result like this:

r = defaultdict(int)

for user_id in multiIndex_series:
    for cookie in multiIndex_series[user_id]:
        r[user_id] += len(multiIndex_series[user_id][cookie]) # I don't know how to get user_id out of the MultiIndex series

Upvotes: 1

Views: 3543

Answers (1)

jezrael
jezrael

Reputation: 862701

You can use groupby with apply lambda function where get length of flatten lists - see answer for more info:

df = pd.DataFrame({'user_id':[1,2,2],
                   'cookie':['1_1','2_1','2_2'],
                   'browser':[['chrome45'],['IE 7'],['IE 7','IE 8']]})
df = df.set_index(['user_id','cookie'])
print (df)
                     browser
user_id cookie              
1       1_1       [chrome45]
2       2_1           [IE 7]
        2_2     [IE 7, IE 8]

from  itertools import chain
print (df.groupby(level='user_id')['browser']
         .apply(lambda x: len(list(chain.from_iterable(x)))))
user_id
1    1
2    3
Name: browser, dtype: int64

Instead lambda is possible use custom function f what is better way for testing:

def f(x):
    print (list(chain.from_iterable(x)))
    return len(list(chain.from_iterable(x)))

['chrome45']
['IE 7', 'IE 7', 'IE 8']

print (df.groupby(level='user_id')['browser'].apply(f))
user_id
1    1
2    3
Name: browser, dtype: int64

If need loop in series, one possible solution is:

for user_id, val in df['browser'].iteritems():
    print (user_id)
    print (val)

['chrome45']
(2, '2_1')
['IE 7']
(2, '2_2')
['IE 7', 'IE 8']

Upvotes: 2

Related Questions