Reputation: 82
I have a pandas dataframe which I am storing some values of which I'm trying to quantify the symmetry across an axis. I.e., to sum the absolute difference in measured values across an axis with 'x' == 0
x y
0 -50 -6.24
...
49 -1 -5.05
50 0 0
51 1 -3.95
...
100 50 -5.66
So I want to calculate:
|-6.24 - -5.66| + ... + |-5.05 - -3.95|
That is, the sum of the absolute difference between each 'y' on opposite sides of the axis.
I'm able to do do this by putting in some for loops (very slow), or some janky pivot table stuff, but I'm wondering if there's a more clean/standard way of doing this in pandas?
Upvotes: 1
Views: 823
Reputation: 527
Here's another way to solve such problem using diff() and abs() methods:
>>> # Suppose we have the following data:
>>> import numpy as np
>>> import pandas as pd
>>> np.random.seed(1234) # make the following line reproducible
>>> N = 100
>>> # A random data for x and y column
>>> x = np.random.randn(N)
>>> y = np.random.randn(N)
>>> # Let construct a dataframe
>>> df = pd.DataFrame({"x": x, "y": z})
>>> # We can apply the diff method to the y-column
>>> dy = df["y"].diff()
When printed, we get:
>>> dy
0 NaN
1 0.275328
2 -0.062942
3 -0.218296
4 0.198992
...
95 -1.535901
96 0.270413
97 1.050294
98 -0.600781
99 -1.339916
Name: y, Length: 100, dtype: float64
The absolute value can be computed as follows:
>>> dy_absval = dy.abd()
>>> dy_absval
0 NaN
1 0.275328
2 0.062942
3 0.218296
4 0.198992
...
95 1.535901
96 0.270413
97 1.050294
98 0.600781
99 1.339916
Name: y, Length: 100, dtype: float64
Note that we could have chained the diff() and abs() to obtain dy_absval by writing dy_absval = df["y"].diff().abs()
If you want to deal with the NaN that appears in the final result, you can dop it or fill its place with appropriate value (say: 0.0). This means, writing:
>>> dy_absval = df["y"].diff().abs().dropna()
>>> # or
>>> dy_absval = df["y"].diff().abs().fillna(0.0)
Upvotes: 0
Reputation: 71610
Try with loc
:
>>> np.abs(df.loc[::-1, 'y'].to_numpy() - df['y'].to_numpy())
array([ 0.58, 1.1 , 0. , 1.1 , 0.58])
>>>
Or to keep a Series
type, use reset_index
:
>>> (df.loc[::-1, 'y'].reset_index(drop=True) - df['y'].reset_index(drop=True)).abs()
0 0.58
1 1.10
2 0.00
3 1.10
4 0.58
Name: y, dtype: float64
>>>
Upvotes: 1
Reputation: 4928
y = df["y"].values
rev_y = y[::-1]
np.abs(y-rev_y)
or you could use np.flip
method
np.abs(y - np.flip(y)
you can refer to Most efficient way to reverse numpy array
Upvotes: 1