Reputation: 221
I have a pandas dataframe that looks like this:
Client 1_act 2_act 3_act 4_act 5_act 6_act ...
1 hiking swimming skating jumping climbing eating
2 eating hiking climbing exploring
3 hiking exercising
4 hiking screaming yelling hopping swimming
...
Every row only has unique activities and can have MANY columns all named #_act (new columns can also be added at any time as a client reports a new activity). Every row has at least a pair (no one has less than 2 activities per row). New activity values can also be added at any time.
I am trying to find a way to return the most common pair of activity. So desired output would be like:
Pair Qty
(hiking, swimming) 2
(hiking, skating) 1
(hiking, jumping) 1
(hiking, climbing) 2
(hiking, eating) 2
(swimming, skating) 1
(swimming, jumping) 1
(swimming, climbing) 1
(swimming, eating) 1
(skating, jumping) 1
(skating, climbing) 1
(skating, eating) 1
(jumping, climbing) 1
(climbing, eating) 2
(eating, exploring) 1
(hiking, exercising) 1
(hiking, screaming) 1
(hiking, yelling) 1
(hiking, hopping) 1
...
The above output is an example output of all possible pairs in each row across all their columns in this sample dataset. If a pair repeated in subsequent rows it should add to the quantity, and if a new pair occurs in subsequent rows, it should be added to the column of pairs as a new row.
The purpose of this is to see what pair of activities are the most common among all clients. Any help would be greatly appreciated! Thank you!!
Upvotes: 1
Views: 1213
Reputation: 862511
Use combinations
in list comprehension for flattening, count tuples by Counter
and pass to DataFrame
constructor:
from collections import Counter
from itertools import combinations
df = df.set_index('Client')
c = Counter([y for x in df.values for y in combinations(x, 2)])
df = pd.DataFrame({'Pair': list(c.keys()), 'Qty': list(c.values())})
For top combinations:
n = 10
L = Counter([y for x in df.values for y in combinations(x, 2)]).most_common(n)
df = pd.DataFrame(L, columns=['Pair', 'Qty'])
print (df)
Pair Qty
0 (hiking, swimming) 2
1 (hiking, climbing) 2
2 (hiking, eating) 2
3 (swimming, eating) 2
4 (hiking, hopping) 2
5 (hiking, skating) 1
6 (hiking, jumping) 1
7 (swimming, skating) 1
8 (swimming, jumping) 1
9 (swimming, climbing) 1
Upvotes: 2