Reputation: 75
Is there anyway to compare values within the same column of a pandas DataFrame?
The task at hand is something like this:
import pandas as pd
data = pd.DataFrame({"A": [0,-5,2,3,-3,-4,-4,-2,-1,5,6,7,3,-1]});
I need to find the maximum time (in indices) consecutive +/- values appear (Equivalently checking consecutive values because the sign can be encoded by True/False). The above data should yield 5 because there are 5 consecutive negative integers [-3,-4,-4,-2,-1]
If possible, I was hoping to avoid using a loop because the number of data points in the column may very well exceed millions in order.
I've tried using data.A.rolling()
and it's variants, but can't seem to figure out any possible way to do this in a vectorized way.
Any suggestions?
Upvotes: 1
Views: 598
Reputation: 221504
Here's a NumPy approach that computes the max interval lengths for the positive and negative values -
def max_interval_lens(arr):
# Store mask of positive values
pos_mask = arr>=0
# Get indices of shifts
idx = np.r_[0,np.flatnonzero(pos_mask[1:] != pos_mask[:-1])+1, arr.size]
# Return max of intervals
lens = np.diff(idx)
s = int(pos_mask[0])
maxs = [0,0]
if len(lens)==1:
maxs[1-s] = lens[0]
else:
maxs = lens[1-s::2].max(), lens[s::2].max()
return maxs # Positive, negative max lens
Sample run -
In [227]: data
Out[227]:
A
0 0
1 -5
2 2
3 3
4 -3
5 -4
6 -4
7 -2
8 -1
9 5
10 6
11 7
12 3
13 -1
In [228]: max_interval_lens(data['A'].values)
Out[228]: (4, 5)
Upvotes: 2