Reputation: 151
I am trying to create a new dataframe with binary (0 or 1) values from an exisitng dataframe. For every row in the given dataframe, the program should take value from each cell and set 1 for the corresponding columns of the row indexed with same number in the new dataframe
I have tried executing the following code snippet.
for col in products :
index = 0;
for item in products.loc[col] :
products_coded.ix[index, 'prod_' + str(item)] = 1;
index = index + 1;
It works for less number of rows. But,it takes lot of time for any large dataset. What could be the best way to get the desired outcome.
Upvotes: 1
Views: 1305
Reputation: 863611
I think you need:
get_dummies
with casting values to string
smax
by columns names max
int
reindex
for ordering and append missing columns, replace NaN
s by 0
by parameter fill_value=0
and remove first 0
columnadd_prefix
for rename columnsdf = pd.DataFrame({'B':[3,1,12,12,8],
'C':[0,6,0,14,0],
'D':[0,14,0,0,0]})
print (df)
B C D
0 3 0 0
1 1 6 14
2 12 0 0
3 12 14 0
4 8 0 0
df1 = (pd.get_dummies(df.astype(str), prefix='', prefix_sep='')
.max(level=0, axis=1)
.rename(columns=lambda x: int(x))
.reindex(columns=range(1, df.values.max() + 1), fill_value=0)
.add_prefix('prod_'))
print (df1)
prod_1 prod_2 prod_3 prod_4 prod_5 prod_6 prod_7 prod_8 prod_9 \
0 0 0 1 0 0 0 0 0 0
1 1 0 0 0 0 1 0 0 0
2 0 0 0 0 0 0 0 0 0
3 0 0 0 0 0 0 0 0 0
4 0 0 0 0 0 0 0 1 0
prod_10 prod_11 prod_12 prod_13 prod_14
0 0 0 0 0 0
1 0 0 0 0 1
2 0 0 1 0 0
3 0 0 1 0 1
4 0 0 0 0 0
Another similar solution:
df1 = (pd.get_dummies(df.astype(str), prefix='', prefix_sep='')
.max(level=0, axis=1))
df1.columns = df1.columns.astype(int)
df1 = (df1.reindex(columns=range(1, df1.columns.max() + 1), fill_value=0)
.add_prefix('prod_'))
Upvotes: 1