Reputation: 1001
I have a pandas dataframe that looks like this:
|Var1 | Var2 | Method | Category|
|-------------------------------------|
| A | 1 | check | cat1 |
| A | 1 | check | cat2 |
| B | 2 | check | cat1 |
| B | 2 | creditcard | cat2 |
where Method can only be either check or creditcard, and the Category is also limited to just two possible categories.
Whenever "Method" is the same for a given Var 1 and Var 2, I'd like to combine the two category rows into columns, like so:
|Var1 | Var2 | Method | Cat1 | Cat2 |
|----------------------------------------|
| A | 1 | check | cat1 | cat2 |
| B | 2 | check | cat1 | NaN |
| B | 2 | creditcard| NaN | cat 2|
Is this even possible? I'd tried various groupby/pivoting/index reset arrangements, but nothing worked.
Thank you for any help!
Upvotes: 0
Views: 48
Reputation: 323226
One way using groupby
create the join
string , then it is get_dummies
problem
s=df.groupby(['Var1','Var2','Method'])['Category'].\
apply(','.join).str.get_dummies(',')
s=s*s.columns
s.reset_index(inplace=True)
s
Var1 Var2 Method cat1 cat2
0 A 1 check cat1 cat2
1 B 2 check cat1
2 B 2 creditcard cat2
Upvotes: 1