Reputation: 59
Based on the row value of my original dataframe I need to change another dataframe row value. This code works but execution time is very high.
I tried multiple form of for loop and functions (iterrows, iteritems, apply) but it didn't help.
Here's my code:
%%timeit
for value in tqdm(range(len(data['DPS_NUM']))):
for col_nm in ts_col:
temp = data[col_nm][value]
if temp != '':
data2[temp][value] = 1
Original dataframe:
col1 col2 col3 col4
123 foo bar zoo
456 bar foo
789 zoo zoo
Expected dataframe:
col1 foo bar zoo
123 1 1 1
456 1 1 1
789 1
My code works but it's slow, I need to optimize it.
Upvotes: 0
Views: 79
Reputation: 862406
Use get_dummies
and aggregate max
per columns:
#if first column is index
df = pd.get_dummies(df, prefix ='', prefix_sep='').max(axis=1, level=0)
print (df)
bar foo zoo
col1
123 1 1 1
456 1 1 0
789 0 0 1
#if first column is not index
#df = pd.get_dummies(df.set_index('col1'), prefix ='', prefix_sep='').max(axis=1, level=0)
Upvotes: 1