Reputation: 449
I have a dataframe
like below:
ID Emp1 Emp2 Emp3
1 John NaN Alex
2 John Steve Alex
3 John Steve Alex
4 Clint Jorge NaN
I would like to convert the above dataframe into something like this:
John Emp1 [1,2,3]
Clint Emp1 [4]
Steve Emp2 [2,3]
Jorge Emp2 [4]
Alex Emp3 [1,2]
So, basically for each column (Emp1, Emp2, Emp3), find "unique" values (drop NaN) and for each unique value, get "ID's" and "column name"
Upvotes: 0
Views: 38
Reputation: 13417
You'll need to melt
your data to get into long-format. Then you'll need to perform a groupby aggregation to condense down your "name" and "Emp" data:
new_df = (df
.melt(id_vars="ID", var_name="emp", value_name="name")
.dropna()
.groupby(["name", "emp"], as_index=False)
.agg(list)
.sort_values(["emp", "name"], ascending=[True, False])
)
print(new_df)
name emp ID
1 Clint Emp1 [4]
2 John Emp1 [1, 2, 3]
3 Jorge Emp2 [4]
4 Steve Emp2 [2, 3]
0 Alex Emp3 [1, 2, 3]
Upvotes: 2