tiny_rick
tiny_rick

Reputation: 75

Comparing values within the same dataframe column

Is there anyway to compare values within the same column of a pandas DataFrame?

The task at hand is something like this:

import pandas as pd
data = pd.DataFrame({"A": [0,-5,2,3,-3,-4,-4,-2,-1,5,6,7,3,-1]});

I need to find the maximum time (in indices) consecutive +/- values appear (Equivalently checking consecutive values because the sign can be encoded by True/False). The above data should yield 5 because there are 5 consecutive negative integers [-3,-4,-4,-2,-1]

If possible, I was hoping to avoid using a loop because the number of data points in the column may very well exceed millions in order.

I've tried using data.A.rolling() and it's variants, but can't seem to figure out any possible way to do this in a vectorized way.

Any suggestions?

Upvotes: 1

Views: 598

Answers (1)

Divakar
Divakar

Reputation: 221504

Here's a NumPy approach that computes the max interval lengths for the positive and negative values -

def max_interval_lens(arr):
    # Store mask of positive values
    pos_mask = arr>=0

    # Get indices of shifts
    idx = np.r_[0,np.flatnonzero(pos_mask[1:] != pos_mask[:-1])+1, arr.size]

    # Return max of intervals
    lens = np.diff(idx)
    s = int(pos_mask[0])
    maxs = [0,0]    
    if len(lens)==1:
        maxs[1-s] = lens[0]
    else:
        maxs = lens[1-s::2].max(), lens[s::2].max() 

    return maxs # Positive, negative max lens

Sample run -

In [227]: data
Out[227]: 
    A
0   0
1  -5
2   2
3   3
4  -3
5  -4
6  -4
7  -2
8  -1
9   5
10  6
11  7
12  3
13 -1

In [228]: max_interval_lens(data['A'].values)
Out[228]: (4, 5)

Upvotes: 2

Related Questions