Reputation: 82

Summing up absolute difference between different rows of values in pandas

I have a pandas dataframe which I am storing some values of which I'm trying to quantify the symmetry across an axis. I.e., to sum the absolute difference in measured values across an axis with 'x' == 0

       x        y
0    -50    -6.24
...
49    -1    -5.05
50     0        0
51     1    -3.95
...
100   50    -5.66

So I want to calculate:

|-6.24 - -5.66| + ... + |-5.05 - -3.95|

That is, the sum of the absolute difference between each 'y' on opposite sides of the axis.

I'm able to do do this by putting in some for loops (very slow), or some janky pivot table stuff, but I'm wondering if there's a more clean/standard way of doing this in pandas?

Upvotes: 1

Answers (3)

eapetcho

Reputation: 527

Here's another way to solve such problem using diff() and abs() methods:

 >>> # Suppose we have the following data:
 >>> import numpy as np
 >>> import pandas as pd
 >>> np.random.seed(1234) # make the following line reproducible
 >>> N = 100
 >>> # A random data for x and y column
 >>> x = np.random.randn(N)
 >>> y = np.random.randn(N)
 >>> # Let construct a dataframe
 >>> df = pd.DataFrame({"x": x, "y": z})
 >>> # We can apply the diff method to the y-column
 >>> dy = df["y"].diff()

When printed, we get:

 >>> dy
 0          NaN
 1     0.275328
 2    -0.062942
 3    -0.218296
 4     0.198992
         ...
 95   -1.535901
 96    0.270413
 97    1.050294
 98   -0.600781
 99   -1.339916
 Name: y, Length: 100, dtype: float64

The absolute value can be computed as follows:

 >>> dy_absval = dy.abd()
 >>> dy_absval
 0          NaN
 1     0.275328
 2     0.062942
 3     0.218296
 4     0.198992
         ...
 95    1.535901
 96    0.270413
 97    1.050294
 98    0.600781
 99    1.339916
 Name: y, Length: 100, dtype: float64

Note that we could have chained the diff() and abs() to obtain dy_absval by writing dy_absval = df["y"].diff().abs()

If you want to deal with the NaN that appears in the final result, you can dop it or fill its place with appropriate value (say: 0.0). This means, writing:

 >>> dy_absval = df["y"].diff().abs().dropna()
 >>> # or
 >>> dy_absval = df["y"].diff().abs().fillna(0.0)

Upvotes: 0

U13-Forward

Reputation: 71610

Try with loc:

>>> np.abs(df.loc[::-1, 'y'].to_numpy() - df['y'].to_numpy())
array([ 0.58,  1.1 ,  0.  ,  1.1 ,  0.58])
>>>

Or to keep a Series type, use reset_index:

>>> (df.loc[::-1, 'y'].reset_index(drop=True) - df['y'].reset_index(drop=True)).abs()
0    0.58
1    1.10
2    0.00
3    1.10
4    0.58
Name: y, dtype: float64
>>>

Upvotes: 1

haneulkim

Reputation: 4928

y = df["y"].values
rev_y = y[::-1]

np.abs(y-rev_y)

or you could use np.flip method

np.abs(y - np.flip(y)

you can refer to Most efficient way to reverse numpy array

Upvotes: 1

Summing up absolute difference between different rows of values in pandas

Answers (3)

Related Questions