Reputation: 369
I am new to pandas and facing difficulty to get an idea, to solve this problem, can someone help me with a way to approach this.
I have a source dataframe in the below format
Here I am sharing the code to generate source dataframe:
df = pd.DataFrame({"Employee ID":[1, 2, 3, 4, 5], "Name":["A", "B", "C", "D", "E"],
"Departments":["hr", "hr, manager", "manager", "developer", "tester, manager, developer"],
"Groups":["group-1", "group-2, group-3", "", "", "group-1"]})
print(df)
I would like to convert it to the below format
I have tried with pivot table and other stackover flow example but it's not helping much, can someone help me a way to understand and solve this problem.
Thanks in advance :)
Upvotes: 3
Views: 294
Reputation: 862751
You can use custom function with Series.str.get_dummies
, then if need replace values by columns names add numpy.where
, but first convert columns for not processing to MultiIndex
by DataFrame.set_index
:
def f(x):
m = x.str.get_dummies(', ').astype(bool)
a = np.where(m, m.columns, '')
return pd.DataFrame(a, columns=m.columns, index=x.index)
df1 = df.set_index(['Employee ID','Name'])
df = pd.concat([f(df1[x]) for x in df1.columns], axis=1, keys=df1.columns)
print (df)
Departments Groups
developer hr manager tester group-1 group-2 group-3
Employee ID Name
1 A hr group-1
2 B hr manager group-2 group-3
3 C manager
4 D developer
5 E developer manager tester group-1
Upvotes: 3