Tony
Tony

Reputation: 793

alternative way to construct pivot table

>>> df = pd.DataFrame({'a': [1,1,1,1,2,2,2,2,3,3,3,3], 
           'b': [0,0,1,1,0,0,1,1,0,0,1,1,], 
                   'c': [5,5,5,8,9,9,6,6,7,8,9,9]})

>>> df
    a  b  c
0   1  0  5
1   1  0  5
2   1  1  5
3   1  1  8
4   2  0  9
5   2  0  9
6   2  1  6
7   2  1  6
8   3  0  7
9   3  0  8
10  3  1  9
11  3  1  9

Is there an alternative way to get this output?

>>> pd.pivot_table(df, index=['a','b'], columns='c', aggfunc=len, fill_value=0).reset_index()
c  a  b  5  6  7  8  9
0  1  0  2  0  0  0  0
1  1  1  1  0  0  1  0
2  2  0  0  0  0  0  2
3  2  1  0  2  0  0  0
4  3  0  0  0  1  1  0
5  3  1  0  0  0  0  2

I have a large df (>~1m lines) with len(df.c.unique()) being 134 so pivot is taking forever.

I was thinking that, given that this result is returned within a second in my actual df:

>>> df.groupby(by = ['a', 'b', 'c']).size().reset_index()
   a  b  c  0
0  1  0  5  2
1  1  1  5  1
2  1  1  8  1
3  2  0  9  2
4  2  1  6  2
5  3  0  7  1
6  3  0  8  1
7  3  1  9  2

whether I could manually construct the desired outcome from this output above

Upvotes: 0

Views: 345

Answers (1)

Scott Boston
Scott Boston

Reputation: 153500

1. Here's one:

df.groupby(by = ['a', 'b', 'c']).size().unstack(fill_value=0).reset_index()

Output:

c  a  b  5  6  7  8  9
0  1  0  2  0  0  0  0
1  1  1  1  0  0  1  0
2  2  0  0  0  0  0  2
3  2  1  0  2  0  0  0
4  3  0  0  0  1  1  0
5  3  1  0  0  0  0  2

2. Here's another way:

pd.crosstab([df.a,df.b], df.c).reset_index()

Output:

c  a  b  5  6  7  8  9
0  1  0  2  0  0  0  0
1  1  1  1  0  0  1  0
2  2  0  0  0  0  0  2
3  2  1  0  2  0  0  0
4  3  0  0  0  1  1  0
5  3  1  0  0  0  0  2

Upvotes: 4

Related Questions