Reputation: 444
I got the following pandas dataframe:
Id Category
1 type 2
1 type 3
1 type 2
2 type 1
2 type 2
I need to process and transpose the above data frame to:
Id Category_type_1 Category_type_2 Category_type_3
1 0 2 1
2 1 1 0
Appreciate if anyone could shows the easiest way to code this in python.
Upvotes: 3
Views: 342
Reputation: 862751
Use pivot_table
:
print (df.pivot_table(index='Id', columns='Category', aggfunc=len, fill_value=0))
Category type 1 type 2 type 3
Id
1 0 2 1
2 1 1 0
Timings:
Small DataFrame - len(df)=5
:
In [63]: %timeit df.groupby(df.columns.tolist()).size().unstack().fillna(0)
1000 loops, best of 3: 1.33 ms per loop
In [64]: %timeit (df.pivot_table(index='Id', columns='Category', aggfunc=len, fill_value=0))
100 loops, best of 3: 3.77 ms per loop
In [65]: %timeit pd.crosstab(df['Id'], df['Category'])
100 loops, best of 3: 4.82 ms per loop
Large DataFrame - len(df)=5k
:
df = pd.concat([df]*1000).reset_index(drop=True)
In [59]: %timeit df.groupby(df.columns.tolist()).size().unstack().fillna(0)
1000 loops, best of 3: 1.73 ms per loop
In [60]: %timeit (df.pivot_table(index='Id', columns='Category', aggfunc=len, fill_value=0))
100 loops, best of 3: 4.64 ms per loop
In [61]: %timeit pd.crosstab(df['Id'], df['Category'])
100 loops, best of 3: 5.46 ms per loop
Very large DataFrame - len(df)=5m
:
df = pd.concat([df]*1000000).reset_index(drop=True)
In [55]: %timeit df.groupby(df.columns.tolist()).size().unstack().fillna(0)
1 loop, best of 3: 514 ms per loop
In [56]: %timeit (df.pivot_table(index='Id', columns='Category', aggfunc=len, fill_value=0))
1 loop, best of 3: 907 ms per loop
In [57]: %timeit pd.crosstab(df['Id'], df['Category'])
1 loop, best of 3: 822 ms per loop
Upvotes: 2
Reputation: 294318
I'd groupby
and use size
df.groupby(df.columns.tolist()).size().unstack().fillna(0)
Upvotes: 3
Reputation:
pd.crosstab(df['Id'], df['Category'])
Out:
Category type 1 type 2 type 3
Id
1 0 2 1
2 1 1 0
Upvotes: 3