Reputation: 105
I'm working on a dataframe df
:
0 1 2 3 4 5 6 7
2 418 -5 -81 526 NaN NaN NaN NaN
5 415 -5 -116 487 -5 116 462 -24
7 413 -5 -81 323 NaN NaN NaN NaN
I wrote a code to check if column 4
is null. If true fill 4
, 5
,6
,7
with values of 0
, 1
, 2
, 3
and add 4 other columns with same values.
rows = df.index
for i in rows1:
if np.isnan(df.loc[i,4]):
df.loc[i,4] = df.loc[i,0]
df.loc[i,5] = df.loc[i,1]
df.loc[i,6] = df.loc[i,2]
df.loc[i,7] = df.loc[i,3]
df.loc[i,8] = df.loc[i,0]
df.loc[i,9] = df.loc[i,1]
df.loc[i,10] = df.loc[i,2]
df.loc[i,11] = df.loc[i,3]
df
The given and desired output:
0 1 2 3 4 5 6 7 8 9 10 11
2 418 -5 -81 526 418 -5 -81 526 418 -5 -81 526
5 415 -5 -116 487 -5 116 462 -24 NaN NaN NaN NaN
7 413 -5 -81 323 413 -5 -81 323 413 -5 -81 323
My Question: How to optimize my code in order to avoid repeating assignments. I tried
df.loc[i,4:7]=df.loc[i,0:3]
df.loc[i,8:11]=df.loc[i,0:3]
but it doesn't give the desired result.
Upvotes: 0
Views: 62
Reputation: 164623
This is one way to vectorise your logic:
# create null test Boolean series & define replacement array
null_test = df[4].isnull()
arr = df.loc[null_test, [0, 1, 2, 3]].values
# update 4, 5, 6, 7
df.loc[null_test, [4, 5, 6, 7]] = arr
# add additional columns
df = df.join(pd.DataFrame(columns=[8, 9, 10, 11]))
# update 8, 9, 10, 11
df.loc[null_test, [8, 9, 10, 11]] = arr
print(df)
0 1 2 3 4 5 6 7 8 9 10 11
2 418 -5 -81 526 418.0 -5.0 -81.0 526.0 418 -5 -81 526
5 415 -5 -116 487 -5.0 116.0 462.0 -24.0 NaN NaN NaN NaN
7 413 -5 -81 323 413.0 -5.0 -81.0 323.0 413 -5 -81 323
Upvotes: 1