ysearka
ysearka

Reputation: 3855

create binary columns in a dataframe from condition on its value

I have a dataframe that looks like this one:

df = pd.DataFrame(np.nan, index=[0,1,2,3], columns=['A','B','C'])
df.iloc[0,0] = 'a'
df.iloc[1,0] = 'b'
df.iloc[1,1] = 'c'
df.iloc[2,0] = 'b'
df.iloc[3,0] = 'c'
df.iloc[3,1] = 'b'
df.iloc[3,2] = 'd'
df

out :   A   B   C
   0    a   NaN NaN
   1    b   c   NaN
   2    b   NaN NaN
   3    c   b   d

And I would like to add new columns to it which names are the values inside the dataframe (here 'a','b','c',and 'd'). Those columns are binary, and reflect if the values 'a','b','c',and 'd' are in the row.

In one picture, the output I'd like is:

        A   B   C    a   b   c   d
   0    a   NaN NaN  1   0   0   0
   1    b   c   NaN  0   1   1   0
   2    b   NaN NaN  0   1   0   0
   3    c   b   d    0   1   1   1

To do this I first create the columns filled with zeros:

cols = pd.Series(df.values.ravel()).value_counts().index
for col in cols:
    df[col] = 0

(It doesn't create the columns in the right order, but that doesn't matter)

Then I...use a loop over the rows and columns...

for row in df.index:
    for col in cols:
        if col in df.loc[row].values:
            df.ix[row,col] = 1

You'll get why I'm looking for another way to do it, even if my dataframe is relatively small (76k rows), it still takes around 8 minutes, which is far too long.

Any idea?

Upvotes: 1

Views: 1087

Answers (1)

IanS
IanS

Reputation: 16241

You're looking for get_dummies. Here I choose to use the .str version:

df.fillna('', inplace=True)
(df.A + '|' + df.B + '|'  + df.C).str.get_dummies()

Output:

   a  b  c  d
0  1  0  0  0
1  0  1  1  0
2  0  1  0  0
3  0  1  1  1

Upvotes: 3

Related Questions