Reputation: 1307
I have a dataframe with six different labels: presence
, x
, y
, vx
, vy
and lane
. I would like to differentiate between two row indices over a range of columns [x
, y
, vx
, vy
]. However, subtracting gives me NaN
. Thanks for the help.
import pandas as pd
data = {'presence': [1, 1, 0, 1],
'x': [17, 35, 46, 57],
'y': [4, 4, 8, 0],
'vx': [2, 5, 9, 12],
'vy': [0.3, 0.5, 0.2, 0],
'lane': [0, 1, 2, 0]}
df = pd.DataFrame(data)
a = df.iloc[[2]]
b = df.iloc[[1]]
diff_x = b[['x','y']] - a[['x','y']] # Gives two rows and two columns of nan
# Expected output: 11 4
Upvotes: 2
Views: 941
Reputation: 19280
You can use .loc
style indexing to get a pandas.Series
for a certain row index and column names. Then you can subtract those two series.
If you expect to get 11 and 4 as your output, you will have to reverse your subtraction operation from your post.
diff_x = df.loc[2, ["x", "y"]] - df.loc[1, ["x", "y"]]
# x 11.0
# y 4.0
# dtype: float64
Upvotes: 1
Reputation: 75080
pandas is index oriented, convert to array and then compare:
a = df.iloc[[2]]
b = df.iloc[[1]]
diff_x = a[['x','y']].to_numpy() - b[['x','y']].to_numpy()
#array([[11, 4]], dtype=int64)
Alternatively for 2 consecutive rows, you can use diff
:
df[['x','y']].diff().iloc[2]
x 11.0
y 4.0
Name: 2, dtype: float64
Upvotes: 1
Reputation: 323306
This is due to you pull out the a and b as DataFrame
not series ,
a
Out[312]:
presence x y vx vy lane
2 0 46 8 9 0.2 2
b
Out[313]:
presence x y vx vy lane
1 1 35 4 5 0.5 1
Above dataframe index
is different, when we do the calculation pandas
will check the index
first , if index not match then output will be NaN
Quick fix :
diff_x = b[['x','y']].values - a[['x','y']].values
diff_x
Out[311]: array([[-11, -4]], dtype=int64)
Upvotes: 1