Reputation: 522
Before going any further, there are a lot of answers relating to the question I am about to ask. As far as I can tell, none directly resolve my issue in a coherent manner.
Here is an example of my data:
Index API Measurement
0 5 1000
1 5 2000
2 5 30.1
3 10 12
4 12 0
5 1 3123.1
6 1 101.1
7 100 3
8 23 0
9 23 123212.1
Here is what I want to do. I want to - as simple and efficiently as possible - figure out the index RANGE of repeated API values.
Desired output:
API Index_RANGE
5 (0,2)
10 3
12 4
1 (5,6)
100 7
23 (8,9)
How do I go about accomplishing this?
Upvotes: 2
Views: 189
Reputation: 59579
If you require tuples or a single number, I'd write your own function, and pass It to .agg
. Form groups finding where 'API'
changes:
def Index_RANGE(x):
minx = x.index.min()
maxx = x.index.max()
if minx != maxx:
return (minx, maxx)
else:
return minx
(df.groupby((df.API != df.API.shift(1)).cumsum()).API.agg(['first', Index_RANGE])
.set_index('first').rename_axis('API'))
Index_RANGE
API
5 (0, 2)
10 3
12 4
1 (5, 6)
100 7
23 (8, 9)
Upvotes: 3
Reputation: 11105
The following code produces your expected output exactly, but is not performant; it will slow down badly with increasing input data size (because of the list comprehension).
# Assume all repeated API values are contiguous;
# if not, first sort df by API
grouped = (df.groupby('API', sort=False)['Index']
.agg(['first', 'last'])
.itertuples(index=False, name=None))
pd.Series([x if x[0] != x[1] else x[0] for x in grouped],
index=df.API.unique(), name='Index_RANGE')
5 (0, 2)
10 3
12 4
1 (5, 6)
100 7
23 (8, 9)
Name: Index_RANGE, dtype: object
Upvotes: 1