Reputation: 17164
I was trying to find where the function crosses the line x=0. I utilized the fact that when the function crosses x-axis, its sign changes.
Now, I have a dataframe like this, I want to find the TWO rows that are closest to zeros, given that function crosses the x-axis at two points.
A value
0 105 0.662932
1 105 0.662932
2 107 0.052653 # sign changes here when A is 107
3 108 -0.228060 # among these two A 107 is closer to zero
4 110 -0.740819
5 112 -1.188906
6 142 -0.228060 # sign changes here when A is 142
7 143 0.052654 # among these two, A 143 is closer to zero
8 144 0.349638
Required output:
A value
2 107 0.052653
7 143 0.052654
Upvotes: 1
Views: 2454
Reputation: 17164
I managed get a simple solution:
import numpy as np
import pandas as pd
data = [
[105, 0.662932],
[105, 0.662932],
[107, 0.052653], # sign changes between here
[108, -0.228060], # and here; first row has `value` closer to 0
[110, -0.740819],
[112, -1.188906],
[142, -0.228060], # sign changes between here
[143, 0.052654], # and here; second row has `value` closer to 0
[144, 0.349638],
]
df = pd.DataFrame(data, columns=["A", "value"]
Solution
def find_closest_to_zero_idx(arr):
fx = np.zeros(len(arr))
fy = np.array(arr)
# lower index when sign changes in array
idx = np.argwhere((np.diff(np.sign(fx - fy)) != 0) )
nearest_to_zero = []
# test two values before and after zero which is nearer to zero
for i in range(len(idx)):
if abs(arr[idx[i][0]]) < abs(arr[idx[i][0]+1]):
nearer = idx[i][0]
nearest_to_zero.append(nearer)
else:
nearer = idx[i][0]+1
nearest_to_zero.append(nearer)
return nearest_to_zero
idx = find_closest_to_zero_idx(df.value.values)
Result
idx = find_closest_to_zero_idx(df.value.values)
df.loc[idx]
A value
2 107 0.052653
7 143 0.052654
Slow but pure pandas method
df['value_shifted'] = df.value.shift(-1)
df['sign_changed'] = np.sign(df.value.values) * np.sign(df.value_shifted.values)
# lower index where sign changes
idx = df[df.sign_changed == -1.0].index.values
# make both lower and upper index from the a-axis negative so that
# we can groupby later.
for i in range(len(idx)):
df.loc[ [idx[i], idx[i]+1], 'sign_changed'] = -1.0 * (i+1)
df1 = df[ np.sign(df.sign_changed) == -1.0]
df2 = df1.groupby('sign_changed')['value'].apply(lambda x: min(abs(x)))
df3 = df2.reset_index()
answer = df.merge(df3,on=['sign_changed','value'])
answer
A value value_shifted sign_changed
0 107 0.052653 -0.228060 -1.0
1 143 0.052654 0.349638 -2.0
Upvotes: 1
Reputation: 10306
import pandas as pd
data = [
[105, 0.662932],
[105, 0.662932],
[107, 0.052653], # sign changes between here
[108, -0.228060], # and here; first row has `value` closer to 0
[110, -0.740819],
[112, -1.188906],
[142, -0.228060], # sign changes between here
[143, 0.052654], # and here; second row has `value` closer to 0
[144, 0.349638],
]
df = pd.DataFrame(data, columns=["A", "value"])
# where the sign is the same between two elements, the diff is 0
# otherwise, it's either 2 or -2 (doesn't matter which for this use case)
# use periods=1 and =-1 to do a diff forwards and backwards
sign = df.value.map(np.sign)
diff1 = sign.diff(periods=1).fillna(0)
diff2 = sign.diff(periods=-1).fillna(0)
# now we have the locations where sign changes occur. We just need to extract
# the `value` values at those locations to determine which of the two possibilities
# to choose for each sign change (whichever has `value` closer to 0)
df1 = df.loc[diff1[diff1 != 0].index]
df2 = df.loc[diff2[diff2 != 0].index]
idx = np.where(abs(df1.value.values) < abs(df2.value.values), df1.index.values, df2.index.values)
df.loc[idx]
A value
2 107 0.052653
7 143 0.052654
Thanks to @Vince W. for mentioning one should use np.where
; I was going with a more convoluted approach initially.
EDIT - see @useruser3483203's answer below which is much faster than this. You can improve even a bit more (2x as fast when I reran their timings) by doing the first few operations (diff, abs, compare equality) on a numpy array instead of a pandas Series too. numpy's diff
is different than the one in pandas, though, in that it drops the first element instead of returning NaN
for it. This means we get back the index of the first row of the sign change, instead of the second, and need to add one to get the next row.
def find_min_sign_changes(df):
vals = df.value.values
abs_sign_diff = np.abs(np.diff(np.sign(vals)))
# idx of first row where the change is
change_idx = np.flatnonzero(abs_sign_diff == 2)
# +1 to get idx of second rows in the sign change too
change_idx = np.stack((change_idx, change_idx + 1), axis=1)
# now we have the locations where sign changes occur. We just need to extract
# the `value` values at those locations to determine which of the two possibilities
# to choose for each sign change (whichever has `value` closer to 0)
min_idx = np.abs(vals[change_idx]).argmin(1)
return df.iloc[change_idx[range(len(change_idx)), min_idx]]
Upvotes: 3
Reputation: 51155
You can generalize the approach using numpy
:
a = df.value.values
u = np.sign(df.value)
m = np.flatnonzero(u.diff().abs().eq(2))
g = np.stack([m-1, m], axis=1)
v = np.abs(a[g]).argmin(1)
df.iloc[g[np.arange(g.shape[0]), v]]
A value
2 107 0.052653
7 143 0.052654
This solution is also going to be a lot more efficient, especially as the size scales.
In [122]: df = pd.concat([df]*100)
In [123]: %timeit chris(df)
870 µs ± 10 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [124]: %timeit nathan(df)
2.03 s ± 10.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [125]: %timeit df.loc[find_closest_to_zero_idx(df.value.values)]
1.81 ms ± 12.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Upvotes: 3