Mejdi Dallel
Mejdi Dallel

Reputation: 642

Python dataframe get index start and end of successive values

Let's say I have this dataframe :

   0
0  1
1  1
2  1
3  2
4  2
5  3
6  3
7  1
8  1

I want to store the start and end indexes of each value (even repeated ones) in the dataframe as well as the value corresponding.

So that I would get a result like this for example :

Value    |   Start   |   End
----------------------------
1        |     0     |    2
2        |     3     |    4
3        |     5     |    6
1        |     7     |    8

I tried this (for the value 2 for example here) :

cs[['key']] = pd.DataFrame(cs.pop(0).values.tolist())
g = cs.groupby('key')
idx_start, idx_end = g.get_group(2).index[[0,-1]]

But this returns only first and last result each time.

Upvotes: 2

Views: 1028

Answers (1)

timgeb
timgeb

Reputation: 78750

Given

>>> df
   0
0  1
1  1
2  1
3  2
4  2
5  3
6  3
7  1
8  1

Solution:

starts_bool = df.diff().ne(0)[0]
starts = df.index[starts_bool]
ends = df.index[starts_bool.shift(-1, fill_value=True)]

result = (df.loc[starts]
            .reset_index(drop=True)
            .assign(Start=starts, End=ends)
            .rename({0: 'Value'}, axis='columns')
          )

Result:

>>> result
   value  Start  End
0      1      0    2
1      2      3    4
2      3      5    6
3      1      7    8

Upvotes: 4

Related Questions