Reputation: 532
I am trying to create two new columns in a Dataframe which accrue from calculations that are performed using values between rows. I iterate using for-loops at the columns of interest after I have converted them into lists.
Assume the following dataframe:
import pandas as pd
import numpy as np
np.random.seed(100)
my_df=pd.DataFrame(np.random.randint(10, size=(6,4)))
my_df.columns=['A', 'x', 'B','y']
my_df.index=[10,30,40,20,60,50]
Is there a "pandas" way which leads to the output of my code hereafter?
xs=np.array(my_df['x'])
diffs=[np.nan]
for i,x in enumerate(xs):
if i>0:
diffs.append(xs[i]-xs[i-1])
my_df['diffs']=diffs
ys=np.array(my_df['y'])
ratios=[]
for j,y in enumerate(ys):
if j>0 and ys[j-1]>=1.5*ys[j]:
ratios.append(True)
else:
ratios.append(False)
my_df['ratios']=ratios
print(my_df)
Output[]:
A x B y diffs ratios
10 8 8 3 7 NaN False
30 7 0 4 2 -8.0 True
40 5 2 2 2 2.0 False
20 1 0 8 4 -2.0 False
60 0 9 6 2 9.0 True
50 4 1 5 3 -8.0 False
I am aware of iterrows
but I was unsuccessful. I would appreciate your input.
Furthermore, if I needed to convert column 'x' and 'y' into a 2D array like that: [[8,7],[0,2],[2,2],[0,4],[9,2],[1,3]]
, could you give me some numpy
direction?
Thanks in advace :-)
Upvotes: 0
Views: 58
Reputation: 13387
Try this one:
>>> import pandas as pd
>>> import numpy as np
>>>
>>> np.random.seed(100)
>>> my_df=pd.DataFrame(np.random.randint(10, size=(6,4)))
>>> my_df.columns=['A', 'x', 'B','y']
>>> my_df.index=[10,30,40,20,60,50]
>>> my_df["diffs"]=my_df["x"]-my_df["x"].shift(1)
>>> my_df
A x B y diffs
10 8 8 3 7 NaN
30 7 0 4 2 -8.0
40 5 2 2 2 2.0
20 1 0 8 4 -2.0
60 0 9 6 2 9.0
50 4 1 5 3 -8.0
>>> my_df["ratios"]=my_df["y"].shift(1)>=1.5 * my_df["y"]
>>> my_df
A x B y diffs ratios
10 8 8 3 7 NaN False
30 7 0 4 2 -8.0 True
40 5 2 2 2 2.0 False
20 1 0 8 4 -2.0 False
60 0 9 6 2 9.0 True
50 4 1 5 3 -8.0 False
>>>
And to export x
and y
to 2-columns list:
>>> import numpy as np
>>> np.array(my_df[["x", "y"]])
array([[8, 7],
[0, 2],
[2, 2],
[0, 4],
[9, 2],
[1, 3]])
Upvotes: 2
Reputation: 323226
So we can do zip
np.array(list(zip(df.x,df.y)))
Out[810]:
array([[8, 7],
[0, 2],
[2, 2],
[0, 4],
[9, 2],
[1, 3]])
Upvotes: 1