Rahul Agarwal
Rahul Agarwal

Reputation: 4100

Make cell values as Columns with one-hot encoding

Input Df:

ID Values
1  1;6;7
2  1;6;7
3  5;7
4  1;5;9;10;2;3

Expected df

ID 1  2   3  4  5  6  7  8  9 10
1  1  0   0  0  0  1  1  0  0  0
2  1  0   0  0  0  1  1  0  0  0 
3  0  0   0  0  1  0  1  0  0  0  
4  1  1   1  0  1  0  0  0  1  1

Problem Statement:

I have a column Values which has colon separated values. I now want to make these values as column names and fill those column values with 1 ,0 .

Example: ID 1 has 1;6;7 so ID 1 has 1 in column 1 ,6 and & and rest is 0

I couldn't find any solution which could achieve this?

Upvotes: 4

Views: 44

Answers (1)

Chris Adams
Chris Adams

Reputation: 18647

Use Series.str.get_dummies with argument sep=';'.

The column names will be string, so its necessary to map them to int using DataFrame.rename then use Dataframe.reindex and numpy.arange for your desired output:

(df.Values.str.get_dummies(sep=';')
 .rename(columns=lambda x: int(x))
 .reindex(np.arange(11), axis=1, fill_value=0))

[out]

  0   1   2   3   4   5   6   7   8   9   10
1   0   1   0   0   0   0   1   1   0   0   0
2   0   1   0   0   0   0   1   1   0   0   0
3   0   0   0   0   0   1   0   1   0   0   0
4   0   1   1   1   0   1   0   0   0   1   1

Upvotes: 3

Related Questions