Newbielp
Newbielp

Reputation: 532

Pandas method to iterate over rows and perform calculations using values of previous row

I am trying to create two new columns in a Dataframe which accrue from calculations that are performed using values between rows. I iterate using for-loops at the columns of interest after I have converted them into lists.

Assume the following dataframe:

import pandas as pd
import numpy as np

np.random.seed(100)
my_df=pd.DataFrame(np.random.randint(10, size=(6,4)))
my_df.columns=['A', 'x', 'B','y']
my_df.index=[10,30,40,20,60,50]

Is there a "pandas" way which leads to the output of my code hereafter?

xs=np.array(my_df['x'])
diffs=[np.nan]
for i,x in enumerate(xs):
    if i>0:
        diffs.append(xs[i]-xs[i-1])
my_df['diffs']=diffs 

ys=np.array(my_df['y'])
ratios=[]
for j,y in enumerate(ys):
    if j>0 and ys[j-1]>=1.5*ys[j]:
        ratios.append(True)
    else:
        ratios.append(False)     
my_df['ratios']=ratios
print(my_df)

Output[]:
    A  x  B  y  diffs  ratios
10  8  8  3  7    NaN   False
30  7  0  4  2   -8.0    True
40  5  2  2  2    2.0   False
20  1  0  8  4   -2.0   False
60  0  9  6  2    9.0    True
50  4  1  5  3   -8.0   False

I am aware of iterrows but I was unsuccessful. I would appreciate your input.

Furthermore, if I needed to convert column 'x' and 'y' into a 2D array like that: [[8,7],[0,2],[2,2],[0,4],[9,2],[1,3]], could you give me some numpy direction?

Thanks in advace :-)

Upvotes: 0

Views: 58

Answers (2)

Georgina Skibinski
Georgina Skibinski

Reputation: 13387

Try this one:

>>> import pandas as pd
>>> import numpy as np
>>>
>>> np.random.seed(100)
>>> my_df=pd.DataFrame(np.random.randint(10, size=(6,4)))
>>> my_df.columns=['A', 'x', 'B','y']
>>> my_df.index=[10,30,40,20,60,50]
>>> my_df["diffs"]=my_df["x"]-my_df["x"].shift(1)
>>> my_df
    A  x  B  y  diffs
10  8  8  3  7    NaN
30  7  0  4  2   -8.0
40  5  2  2  2    2.0
20  1  0  8  4   -2.0
60  0  9  6  2    9.0
50  4  1  5  3   -8.0
>>> my_df["ratios"]=my_df["y"].shift(1)>=1.5 * my_df["y"]
>>> my_df
    A  x  B  y  diffs  ratios
10  8  8  3  7    NaN   False
30  7  0  4  2   -8.0    True
40  5  2  2  2    2.0   False
20  1  0  8  4   -2.0   False
60  0  9  6  2    9.0    True
50  4  1  5  3   -8.0   False
>>>

And to export x and y to 2-columns list:

>>> import numpy as np
>>> np.array(my_df[["x", "y"]])
array([[8, 7],
       [0, 2],
       [2, 2],
       [0, 4],
       [9, 2],
       [1, 3]])

Upvotes: 2

BENY
BENY

Reputation: 323226

So we can do zip

np.array(list(zip(df.x,df.y)))
Out[810]: 
array([[8, 7],
       [0, 2],
       [2, 2],
       [0, 4],
       [9, 2],
       [1, 3]])

Upvotes: 1

Related Questions