ZZZ
ZZZ

Reputation: 11

How to efficiently manipulate dataframe

My goal:if the value is 2, set this cell and the one on next row with 0. If the value is 3, set this cell to 1 and set the one on next row to 0. from :

1 1 1
0 2 3
1 1 1

to:

1 1 1
0 0 1
1 0 0

for i in range(0,len(dfnew)):
    for j in range(0,len(dfnew.columns)):

        if dfnew.iloc[i,j] == 2: 
            dfnew.iloc[i,j] = 0  
            dfnew.iloc[i+1, j] = 0 

        if dfnew.iloc[i,j] ==3:   
            dfnew.iloc[i+1,j] = 0 
            dfnew.iloc[i,j] = 1   

The double nested 'for loop' works but it's very inefficient on a 1000*2000 Dataframe. Is there anyway to speed up this manipulation? Thank you!

Upvotes: 0

Views: 38

Answers (1)

juanpa.arrivillaga
juanpa.arrivillaga

Reputation: 95993

I suspect that using np.where to get the indices, then using iloc on those indices, will be faster than your loop. iloc based setting has significant overhead, but can set multiple things very quickly, however, setting individual elements incurs that overhead many many times. So try:

In [30]: df
Out[30]:
   0  1  2
0  1  1  1
1  0  2  3
2  1  1  1

In [31]: idx, idy = np.where(df == 2)

In [32]: df.iloc[idx, idy] = 0

In [33]: df.iloc[idx + 1, idy] = 0

In [34]: idx, idy = np.where(df == 3)

In [35]: df.iloc[idx, idy] = 1

In [36]: df.iloc[idx + 1, idy] = 0

In [37]: df
Out[37]:
   0  1  2
0  1  1  1
1  0  0  1
2  1  0  0

Upvotes: 1

Related Questions