trusted
trusted

Reputation: 105

Assignments with pd.DataFrame.loc

I'm working on a dataframe df:

     0   1     2     3     4     5     6    7     
2  418  -5   -81   526   NaN   NaN   NaN  NaN   
5  415  -5  -116   487    -5   116   462  -24   
7  413  -5   -81   323   NaN   NaN   NaN  NaN

I wrote a code to check if column 4 is null. If true fill 4, 5,6,7 with values of 0, 1, 2, 3 and add 4 other columns with same values.

rows = df.index
for i in rows1:
   if np.isnan(df.loc[i,4]):
       df.loc[i,4] = df.loc[i,0]
       df.loc[i,5] = df.loc[i,1]
       df.loc[i,6] = df.loc[i,2]
       df.loc[i,7] = df.loc[i,3]
       df.loc[i,8] = df.loc[i,0]
       df.loc[i,9] = df.loc[i,1]
       df.loc[i,10] = df.loc[i,2]
       df.loc[i,11] = df.loc[i,3]
df

The given and desired output:

     0   1     2     3     4     5     6    7    8   9   10   11    
2  418  -5   -81   526   418    -5   -81  526  418  -5  -81  526 
5  415  -5  -116   487    -5   116   462  -24  NaN NaN  NaN  NaN
7  413  -5   -81   323   413    -5   -81  323  413  -5  -81  323

My Question: How to optimize my code in order to avoid repeating assignments. I tried

df.loc[i,4:7]=df.loc[i,0:3]
df.loc[i,8:11]=df.loc[i,0:3]

but it doesn't give the desired result.

Upvotes: 0

Views: 62

Answers (1)

jpp
jpp

Reputation: 164623

This is one way to vectorise your logic:

# create null test Boolean series & define replacement array
null_test = df[4].isnull()
arr = df.loc[null_test, [0, 1, 2, 3]].values

# update 4, 5, 6, 7
df.loc[null_test, [4, 5, 6, 7]] = arr

# add additional columns
df = df.join(pd.DataFrame(columns=[8, 9, 10, 11]))

# update 8, 9, 10, 11
df.loc[null_test, [8, 9, 10, 11]] = arr

print(df)

    0   1    2    3      4      5      6      7    8    9    10   11
2  418  -5  -81  526  418.0   -5.0  -81.0  526.0  418   -5  -81  526
5  415  -5 -116  487   -5.0  116.0  462.0  -24.0  NaN  NaN  NaN  NaN
7  413  -5  -81  323  413.0   -5.0  -81.0  323.0  413   -5  -81  323

Upvotes: 1

Related Questions