nate
nate

Reputation: 522

Evaluate Row/Index Range of Identical Column Values in Pandas Dataframe

Before going any further, there are a lot of answers relating to the question I am about to ask. As far as I can tell, none directly resolve my issue in a coherent manner.

Here is an example of my data:

Index     API     Measurement
    0     5       1000   
    1     5       2000
    2     5       30.1
    3     10      12
    4     12      0
    5     1       3123.1
    6     1       101.1
    7     100     3
    8     23      0
    9     23      123212.1

Here is what I want to do. I want to - as simple and efficiently as possible - figure out the index RANGE of repeated API values.

Desired output:

API Index_RANGE
5   (0,2)
10  3
12  4
1   (5,6)
100 7
23  (8,9)

How do I go about accomplishing this?

Upvotes: 2

Views: 189

Answers (2)

ALollz
ALollz

Reputation: 59579

If you require tuples or a single number, I'd write your own function, and pass It to .agg. Form groups finding where 'API' changes:

def Index_RANGE(x):
    minx = x.index.min()
    maxx = x.index.max()
    if minx != maxx:
        return (minx, maxx)
    else:
        return minx

(df.groupby((df.API != df.API.shift(1)).cumsum()).API.agg(['first', Index_RANGE])
    .set_index('first').rename_axis('API'))

Output

    Index_RANGE
API            
5        (0, 2)
10            3
12            4
1        (5, 6)
100           7
23       (8, 9)

Upvotes: 3

Peter Leimbigler
Peter Leimbigler

Reputation: 11105

The following code produces your expected output exactly, but is not performant; it will slow down badly with increasing input data size (because of the list comprehension).

# Assume all repeated API values are contiguous;
# if not, first sort df by API
grouped = (df.groupby('API', sort=False)['Index']
            .agg(['first', 'last'])
            .itertuples(index=False, name=None))

pd.Series([x if x[0] != x[1] else x[0] for x in grouped],
           index=df.API.unique(), name='Index_RANGE')

5      (0, 2)
10          3
12          4
1      (5, 6)
100         7
23     (8, 9)
Name: Index_RANGE, dtype: object

Upvotes: 1

Related Questions