Reputation: 215
Based on dataframe column del value, looked into other columns col_0-14 and then set value as 100 for that row (not to update nan value).
Dataframe look like:
id val_1 del col_1 col_2 col_3 col_4 col_5 col_6 col_7 col_8 col_9 col_10 col_11 col_12 col_13 col14
1 13 0 0 0 0 0 0 0 0 0 0 0 0 0
2 11 0 0 0 0 0 0 0 0 0 0 0
3 8 1 0 0 0 0 0 7 7 8
4 6 1 500 1000 1500 2000 2500 3000
As del on 3rd, 4th row, code should replace values to 100 up till val_1 value.
3 8 1 100 100 100 100 100 100 100 100
4 6 1 100 100 100 100 100 100
I tried:
df.loc[df['del'] == 1, df.columns.str.startswith('col')] = 100
It will replace all row values (col1-14) to 100. Is there a way, I can control it to only till val_1 value and rest of col remains as nan values. OR
After above code, I could use val_1 value to update row again with nan values with using loop logic.
def cut_data(row):
for i in range(1, 15):
if i > row['val_1']:
row['col_' + str(i)] = np.NaN
return row
df = df.apply(cut_data, axis=1)
Please suggest any alternative to loop logic i.e without loop.
DDL to generate DataFrame:
df1 = pd.DataFrame({'id': [1, 2, 3, 4],
'val_1': [13, 11, 8, 6],
'del' : [np.nan,np.nan,1,1],
'col1': [0, 0, 0, 500],
'col2': [0, 0, 0, 1000],
'col3': [0, 0, 0, 1500],
'col4': [0, 0, 0, 2000],
'col5' : [0, 0, 0, 2500],
'col6': [0, 0, 7, 3000],
'col7': [0, 0, 7,np.nan ],
'col8': [0, 0, 7, np.nan],
'col9': [0, 0, np.nan, np.nan],
'col10': [0, 0, np.nan, np.nan],
'col11': [0, 0, np.nan, np.nan],
'col12': [0, np.nan, np.nan, np.nan],
'col13': [0, np.nan, np.nan, np.nan],
'col14': [ np.nan, np.nan, np.nan, np.nan]})
Thanks!
Upvotes: 1
Views: 1341
Reputation: 862441
Solution by last EDIT:
#extract columns names starting by col
c = df.columns[df.columns.str.startswith('col')]
#created 2d mask by compare 1d arrange array by length of c for 2d mask
colsarray = np.arange(len(c))
max1 = df['val_1'].to_numpy()[:, None]
print (max1)
[[13]
[11]
[ 8]
[ 6]]
mask1 = colsarray < max1
print (mask1)
[[ True True True True True True True True True True True True
True False]
[ True True True True True True True True True True True False
False False]
[ True True True True True True True True False False False False
False False]
[ True True True True True True False False False False False False
False False]]
mask2 = df['del'] == 1
print (mask2)
0 False
1 False
2 True
3 True
Name: del, dtype: bool
#chain by mask2 by & for bitwise AND - first 2 rows are set to False
mask = mask1 & mask2.to_numpy()[:, None]
print (mask)
[[False False False False False False False False False False False False
False False]
[False False False False False False False False False False False False
False False]
[ True True True True True True True True False False False False
False False]
[ True True True True True True False False False False False False
False False]]
#repalce only filtered rows by columns c
df[c] = np.where(mask, 100, df[c])
print (df)
id val_1 del col1 col2 col3 col4 col5 col6 col7 col8 \
0 1 13 NaN 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
1 2 11 NaN 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
2 3 8 1.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0
3 4 6 1.0 100.0 100.0 100.0 100.0 100.0 100.0 NaN NaN
col9 col10 col11 col12 col13 col14
0 0.0 0.0 0.0 0.0 0.0 NaN
1 0.0 0.0 0.0 NaN NaN NaN
2 NaN NaN NaN NaN NaN NaN
3 NaN NaN NaN NaN NaN NaN
Upvotes: 1