Reputation: 51
Input:-
empNo name
1234 [ AB, DE ]
5678 [ FG, IJ ]
Command:-
dataFrame = dataFrame.join(dataFrame.name.str.join('|').str.get_dummies().add_prefix('dummy_name_'))
The above command brings dummy "for each character of the column name"
Output:-
empNo name dummy_name_A dummy_name_B dummy_name_D dummy_name_E dummy_name_F dummy_name_G dummy_name_I dummy_name_J
1234 [ AB, DE ] 1 1 1 1 0 0 0 0
5678 [ FG, IJ ] 0 0 0 0 1 1 1 1
Expected:-
empNo name dummy_name_AB dummy_name_DE dummy_name_FG dummy_name_IJ
1234 [ AB, DE ] 1 1 0 0
5678 [ FG, IJ ] 0 0 1 1
Upvotes: 2
Views: 1766
Reputation: 323226
I think the list is not the list , so we using ast to convert the string type column back to list
import ast
df.name=df.name.apply(ast.literal_eval)
Then using str get_dummies
s=df.name.apply(pd.Series).stack().str.get_dummies().sum(level=0).add_prefix('dummy_name_')
s
dummy_name_AB dummy_name_DE dummy_name_FG dummy_name_IJ
0 1 1 0 0
1 0 0 1 1
Then
pd.concat([df[['empNo']],s],axis=1)
The data input
df.to_dict()
{'empNo': {0: 1234, 1: 5678}, 'name': {0: ['AB', 'DE'], 1: ['FG', 'IJ']}}
Upvotes: 5