Reputation: 505
Considering that I have a data frame with the following structure:
Symbol CS
0 A1 MSI
1 A2 MSI
2 A3 GS
3 A2 EBV
4 A2 CIN
5 A1 GS
6 A2 MSI
...
I would like to have a dataframe that categorizes the column CS as the header and the column Symbol as the index. The values would be the occurrence of each symbol by the CS type. As an example, the output expected from the above dataframe would be:
MSI EBV CIN GS
A1 1 0 0 1
A2 2 1 1 0
A3 0 0 0 1
What would be the best efficient approach to resolve this particular case?
Upvotes: 1
Views: 605
Reputation: 323236
You can using crosstab
pd.crosstab(df.Symbol,df.CS)
Out[862]:
CS CIN EBV GS MSI
Symbol
A1 0 0 1 1
A2 1 1 0 2
A3 0 0 1 0
Upvotes: 5
Reputation: 164673
One way is to use pd.value_counts
followed by pd.pivot_table
.
Edit: But really you should use @Wen's solution.
counts = df.groupby('Symbol')['CS'].apply(pd.value_counts).reset_index()
res = counts.pivot_table(index='Symbol', columns='level_1', values='CS',
fill_value=0, aggfunc=lambda x: x)
print(res)
# level_1 CIN EBV GS MSI
# Symbol
# A1 0 0 1 1
# A2 1 1 0 2
# A3 0 0 1 0
Upvotes: 1