How to get start and end of ranges in pandas

Question

I have a pandas Series containing groups of numbers and nans, and I want to get the start and end of each group. The following code does this:

def get_ranges(d):
    results = []
    start = None
    for i in range(len(d) - 1):
        if start is None and not np.isnan(d.ix[i]):
            start = d.index[i]
        if start is not None and np.isnan(d.ix[i + 1]):
            results.append((start, d.index[i]))
            start = None
    if start is not None:
        results.append((start, d.index[i]))
    return pd.DataFrame(results, columns=['start', 'end'])

E.g.:

In [24]: d = pd.Series([0, 1, 4, 2, nan, nan, nan, 4, 2, nan, 10, nan])

In[25]: get_ranges(d)
Out[25]: 
   start  end
0      0    3
1      7    8
2     10   10

[3 rows x 2 columns]

But it seems like this is something that pandas should be able to do quite easily, possibly using groupby. Is there some built in method of getting these groups that I'm missing?

waitingkuo · Accepted Answer

Not sure whether it has a more convenient way to do that, followings are what I'm using:

Get the index of those have numbers but not nan

In [134]: s = d.dropna().index.to_series()

In [135]: s
Out[135]: 
0      0
1      1
2      2
3      3
7      7
8      8
10    10
dtype: int64

Get start and end by

In [136]: start = s[s.diff(1) != 1].reset_index(drop=True)

In [137]: end = s[s.diff(-1) != -1].reset_index(drop=True)

Then you can construct what you want by

In [138]: pd.DataFrame({'start': start, 'end': end}, columns=['start', 'end'])
Out[138]: 
   start  end
0      0    3
1      7    8
2     10   10

[3 rows x 2 columns]

How to get start and end of ranges in pandas

Answers (2)

Get the index of those have numbers but not nan

Get start and end by

Then you can construct what you want by

Related Questions