Reputation: 793
>>> df = pd.DataFrame({'a': [1,1,1,1,2,2,2,2,3,3,3,3],
'b': [0,0,1,1,0,0,1,1,0,0,1,1,],
'c': [5,5,5,8,9,9,6,6,7,8,9,9]})
>>> df
a b c
0 1 0 5
1 1 0 5
2 1 1 5
3 1 1 8
4 2 0 9
5 2 0 9
6 2 1 6
7 2 1 6
8 3 0 7
9 3 0 8
10 3 1 9
11 3 1 9
Is there an alternative way to get this output?
>>> pd.pivot_table(df, index=['a','b'], columns='c', aggfunc=len, fill_value=0).reset_index()
c a b 5 6 7 8 9
0 1 0 2 0 0 0 0
1 1 1 1 0 0 1 0
2 2 0 0 0 0 0 2
3 2 1 0 2 0 0 0
4 3 0 0 0 1 1 0
5 3 1 0 0 0 0 2
I have a large df
(>~1m lines) with len(df.c.unique())
being 134 so pivot
is taking forever.
I was thinking that, given that this result is returned within a second in my actual df
:
>>> df.groupby(by = ['a', 'b', 'c']).size().reset_index()
a b c 0
0 1 0 5 2
1 1 1 5 1
2 1 1 8 1
3 2 0 9 2
4 2 1 6 2
5 3 0 7 1
6 3 0 8 1
7 3 1 9 2
whether I could manually construct the desired outcome from this output above
Upvotes: 0
Views: 345
Reputation: 153500
df.groupby(by = ['a', 'b', 'c']).size().unstack(fill_value=0).reset_index()
Output:
c a b 5 6 7 8 9
0 1 0 2 0 0 0 0
1 1 1 1 0 0 1 0
2 2 0 0 0 0 0 2
3 2 1 0 2 0 0 0
4 3 0 0 0 1 1 0
5 3 1 0 0 0 0 2
pd.crosstab([df.a,df.b], df.c).reset_index()
Output:
c a b 5 6 7 8 9
0 1 0 2 0 0 0 0
1 1 1 1 0 0 1 0
2 2 0 0 0 0 0 2
3 2 1 0 2 0 0 0
4 3 0 0 0 1 1 0
5 3 1 0 0 0 0 2
Upvotes: 4