Reputation: 2089
My dummy data frame is as follows:
+--------+------+------+------+------+
| item | p1 | p2 | p3 | p4 |
|--------+------+------+------+------|
| a | 1 | 0 | 1 | 1 |
| b | 0 | 1 | 1 | 0 |
| c | 1 | 0 | 1 | 1 |
| d | 0 | 0 | 0 | 1 |
| e | 1 | 0 | 1 | 1 |
| f | 1 | 1 | 1 | 1 |
| g | 1 | 0 | 0 | 0 |
+--------+------+------+------+------+
I want to find the the number of ways in which the parameters p1,p2,p3,p4
are used in combination or not. Expected result is something like follows:
+--------+------+--------+--------+--------+
| Length | P-groups(s) | Count | Items |
+--------+---------------+--------+--------+
| 1 | p1 | 1 | g |
| | p4 | 1 | d |
| | | | |
| 2 | p2,p3 | 1 | b |
| | | | |
| 3 | p1,p2,p3 | 3 | [a,c,e]|
| | | | |
| 4 | p1,p2,p3,p4 | 1 | f |
+--------+---------------+--------+--------+
So, my crude code is as follows:
import pandas as pd
from itertools import chain, combinations
df= pd.DataFrame({'item': ['a','b','c','d','e','f','g'],
'p1': [1,0,1,0,1,1,1],
'p2': [0,1,0,0,0,1,0],
'p3': [1,1,1,0,1,1,0],
'p4': [1,0,1,1,1,1,0]})
def all_subsets(ss):
return chain(*map(lambda x: combinations(ss, x), range(0, len(ss)+1)))
subsets = []
for subset in all_subsets(list(df)[1:]):
subsets.append(list(subset))
for grp in subsets[1:]: #subset[1:] is to exclude empty set
print df.groupby(grp).size().reset_index().rename(columns={0:'count'})
I was wondering if there is any pandas way to get the expected result?
Upvotes: 2
Views: 305
Reputation: 29742
Using pd.groupby
with pd.filter
:
import pandas as pd
tmp = df.filter(like='p')
new = tmp.replace(1, pd.Series(tmp.columns, tmp.columns)).copy(deep=True)
df['length'] = tmp.sum(1)
df['groups'] = new.apply(lambda x:','.join(s for s in x if s), 1)
gdf = df.groupby(['length', 'groups'])['item'].agg(['count', list])
print(gdf)
Output:
count list
length groups
1 p1 1 [g]
p4 1 [d]
2 p2,p3 1 [b]
3 p1,p3,p4 3 [a, c, e]
4 p1,p2,p3,p4 1 [f]
If you want to unpack the gdf['list']
add the follow line:
gdf['list'] = [l[0] if len(l)==1 else l for l in gdf['list']]
which makes just like the desired output:
count list
length groups
1 p1 1 g
p4 1 d
2 p2,p3 1 b
3 p1,p3,p4 3 [a, c, e]
4 p1,p2,p3,p4 1 f
Upvotes: 2