Reputation: 874
I have the following DataFrame:
df = pd.DataFrame({'index':[0,1,2,3,4,5,6,7,8,9,10], 'X':[0,0,1,1,0,0,1,1,1,0,0]})
df.set_index('index', inplace = True)
X
index
0 0
1 0
2 1
3 1
4 0
5 0
6 1
7 1
8 1
9 0
10 0
What I need is to return a list of tuples showing the index value for the first and last instances of the 1s for each sequence of 1s (sorry if that's confusing). i.e.
Want:
[(2,3), (6,8)]
The first instance of the first 1 occurs at index point 2, then the last 1 in that sequence occurs at index point 3. The next 1 occurs at index point 6, and the last 1 in that sequence occurs at index point 8.
What I've tried:
I can grab the first one using numpy's argmax function. i.e.
x1 = np.argmax(df.values)
y1 = np.argmin(df.values[x1:])
(x1,2 + y1 - 1)
Which will give me the first tuple, but iterating through seems messy and I feel like there's a better way.
Upvotes: 2
Views: 223
Reputation: 153460
Here's a pure pandas solution:
df.groupby(df['X'].eq(0).cumsum().mask(df['X'].eq(0)))\
.apply(lambda x: (x.first_valid_index(),x.last_valid_index()))\
.tolist()
Output:
[(2, 3), (6, 8)]
Upvotes: 1
Reputation: 51155
You can use a third party library: more_itertools
loc
with mit.consecutive_groups
[list(group) for group in mit.consecutive_groups(df.loc[df.ones == 1].index)]
# [[2, 3], [6, 7, 8]]
Simple list comprehension:
x = [(i[0], i[-1]) for i in x]
# [(2, 3), (6, 8)]
An approach using numpy, adapted from a great answer by @Warren Weckesser
def runs(a):
isone = np.concatenate(([0], np.equal(a, 1).view(np.int8), [0]))
absdiff = np.abs(np.diff(isone))
ranges = np.where(absdiff == 1)[0].reshape(-1, 2)
return [(i, j-1) for i, j in ranges]
runs(df.ones.values)
# [(2, 3), (6, 8)]
Upvotes: 2
Reputation: 8631
You need more_itertools.consecutive_groups
import more_itertools as mit
def find_ranges(iterable):
"""Yield range of consecutive numbers."""
for group in mit.consecutive_groups(iterable):
group = list(group)
if len(group) == 1:
yield group[0]
else:
yield group[0], group[-1]
list(find_ranges(df['X'][df['X']==1].index))
Output:
[(2, 3), (6, 8)]
Upvotes: 2