shunyo
shunyo

Reputation: 1307

Subtracting over a range of columns for two rows in pandas dataframe python

I have a dataframe with six different labels: presence, x, y, vx, vy and lane. I would like to differentiate between two row indices over a range of columns [x, y, vx, vy]. However, subtracting gives me NaN. Thanks for the help.

import pandas as pd
data = {'presence': [1, 1, 0, 1],
        'x': [17, 35, 46, 57],
        'y': [4, 4, 8, 0],
        'vx': [2, 5, 9, 12],
        'vy': [0.3, 0.5, 0.2, 0], 
        'lane': [0, 1, 2, 0]}
df = pd.DataFrame(data)
a = df.iloc[[2]]
b = df.iloc[[1]]
diff_x = b[['x','y']] - a[['x','y']] # Gives two rows and two columns of nan
# Expected output: 11  4 

Upvotes: 2

Views: 941

Answers (3)

jkr
jkr

Reputation: 19280

You can use .loc style indexing to get a pandas.Series for a certain row index and column names. Then you can subtract those two series.

If you expect to get 11 and 4 as your output, you will have to reverse your subtraction operation from your post.

diff_x = df.loc[2, ["x", "y"]] - df.loc[1, ["x", "y"]]

# x    11.0
# y     4.0
# dtype: float64

Upvotes: 1

anky
anky

Reputation: 75080

pandas is index oriented, convert to array and then compare:

a = df.iloc[[2]]
b = df.iloc[[1]]
diff_x = a[['x','y']].to_numpy() - b[['x','y']].to_numpy()
#array([[11,  4]], dtype=int64)

Alternatively for 2 consecutive rows, you can use diff:

df[['x','y']].diff().iloc[2]

x    11.0
y     4.0
Name: 2, dtype: float64

Upvotes: 1

BENY
BENY

Reputation: 323306

This is due to you pull out the a and b as DataFrame not series ,

a
Out[312]: 
   presence   x  y  vx   vy  lane
2         0  46  8   9  0.2     2
b
Out[313]: 
   presence   x  y  vx   vy  lane
1         1  35  4   5  0.5     1

Above dataframe index is different, when we do the calculation pandas will check the index first , if index not match then output will be NaN

Quick fix :

diff_x = b[['x','y']].values - a[['x','y']].values
diff_x
Out[311]: array([[-11,  -4]], dtype=int64)

Upvotes: 1

Related Questions