Reputation: 4492
I have the following pandas dataframe
foo = pd.DataFrame({'step': [1,2,3,4,5,6,7,8], 'val': [1,1,1,0,0,1,0,1]})
I would like to get the 1st and last step
for each of the sequence of 1
s in the val
column.
Explanation:
The first sequence of ones happens at steps 1,2,3
-> first step
is 1
last step
is 3
The second sequence of ones happens at step 6 -> first step
is 6
last step
is 6
The last sequence of ones happens at step 8 -> first step
is 8
last step
is 8
So the output is the list [1,3,6,6,8,8]
Any ideas how to do that ?
Upvotes: 1
Views: 57
Reputation: 765
foo.assign(col1=foo.val.diff().eq(1).cumsum()).query("val==1")\
.groupby(['col1'],as_index=False).step.agg(['first','last'])\
.stack().tolist()
output: [1, 3, 6, 6, 8, 8]
Upvotes: 0
Reputation: 261860
IIUC, you can use a groupby
aggregation, flatten using numpy and convert to list:
# compute groups of consecutive numbers
group = foo['val'].ne(foo['val'].shift()).cumsum()
out = (foo
.loc[foo['val'].eq(1), 'step'] # keep step only where vale is 1
.groupby(group).agg(['first', 'last']) # get first and last
.to_numpy().ravel().tolist() # reshape
)
output: [1, 3, 6, 6, 8, 8]
Upvotes: 2