Reputation: 3855
I have a dataframe that looks like this one:
df = pd.DataFrame(np.nan, index=[0,1,2,3], columns=['A','B','C'])
df.iloc[0,0] = 'a'
df.iloc[1,0] = 'b'
df.iloc[1,1] = 'c'
df.iloc[2,0] = 'b'
df.iloc[3,0] = 'c'
df.iloc[3,1] = 'b'
df.iloc[3,2] = 'd'
df
out : A B C
0 a NaN NaN
1 b c NaN
2 b NaN NaN
3 c b d
And I would like to add new columns to it which names are the values inside the dataframe (here 'a'
,'b'
,'c'
,and 'd'
). Those columns are binary, and reflect if the values 'a'
,'b'
,'c'
,and 'd'
are in the row.
In one picture, the output I'd like is:
A B C a b c d
0 a NaN NaN 1 0 0 0
1 b c NaN 0 1 1 0
2 b NaN NaN 0 1 0 0
3 c b d 0 1 1 1
To do this I first create the columns filled with zeros:
cols = pd.Series(df.values.ravel()).value_counts().index
for col in cols:
df[col] = 0
(It doesn't create the columns in the right order, but that doesn't matter)
Then I...use a loop over the rows and columns...
for row in df.index:
for col in cols:
if col in df.loc[row].values:
df.ix[row,col] = 1
You'll get why I'm looking for another way to do it, even if my dataframe is relatively small (76k rows), it still takes around 8 minutes, which is far too long.
Any idea?
Upvotes: 1
Views: 1087