Inspi
Inspi

Reputation: 599

Pandas Series get mean value of multiple intervals

With an unevenly spaced datetime Series like so:

date
2019-02-20 13:00:49.268    41.177929
2019-02-20 13:00:50.275    12.431984
2019-02-20 13:00:51.397    18.042411
2019-02-20 13:00:52.434    13.144179
2019-02-20 13:00:53.542    21.349083
                             ...    
2019-02-20 13:05:55.059    51.763360
2019-02-20 13:05:56.169    58.140644
2019-02-20 13:05:57.279     0.411533
2019-02-20 13:05:58.408    48.404780
2019-02-20 13:05:59.518    14.626680
Name: Values, Length: 285, dtype: float64

And with a provided list of datetime intervals (no overlapping) formatted like: [(start1, end1), (start2, end2), ...]

What would be a short way to get the mean value of the Series in the combined intervals? (any interpolation function can be used here)

I'm a pandas beginner and didn't find anything here or in Pandas docs, sorry if this is a dumb question...

Upvotes: 4

Views: 1579

Answers (2)

Valdi_Bo
Valdi_Bo

Reputation: 30971

Let's define the test source Series as:

2019-02-20 13:00:49.268    40
2019-02-20 13:00:50.275    30
2019-02-20 13:02:51.397    18
2019-02-20 13:02:52.434    13
2019-02-20 13:05:53.542    21
2019-02-20 13:05:55.059    51
2019-02-20 13:06:56.169    32
2019-02-20 13:06:57.279    38
2019-02-20 13:08:58.408    48
2019-02-20 13:08:59.518    14
Name: Val, dtype: int64

and the list of intervals as:

intv = [(pd.to_datetime('2019-02-20 13:00'), pd.to_datetime('2019-02-20 13:01')),
        (pd.to_datetime('2019-02-20 13:06'), pd.to_datetime('2019-02-20 13:07'))]

A preparatory step is to create an IntervalIndex:

intvInd = pd.IntervalIndex.from_tuples(intv)

Then the mean you want can be computed as:

s[[intvInd.contains(v) for v in s.index.values]].mean()

Note

I tried the above solution using Pandas version 0.24.2. As Inspi noticed, at least in version 0.25 the last instruction must be changed to:

s[[any(intvInd.contains(v)) for v in s.index.values]].mean()

Apparently, version 0.24 adds this any(...) somewhere "under the hood". In version 0.25 this behavior does not take place.

Upvotes: 3

Code Different
Code Different

Reputation: 93161

Assume your time series is indexed by date:

dates = pd.date_range('2019-07-01', '2019-07-25', freq='T')
s = pd.Series(np.random.uniform(1, 100, len(dates)), index=dates)

Some sample data:

2019-07-01 00:00:00    54.851538
2019-07-01 00:01:00    82.493677
2019-07-01 00:02:00    80.589765
2019-07-01 00:03:00    54.973948
2019-07-01 00:04:00    18.216064

And your intervals are defined in a data frame:

intervals = pd.DataFrame([
    ['2019-07-01', '2019-07-02'],
    ['2019-07-02', '2019-07-10']
], columns=['StartDate', 'EndDate'], dtype='datetime64[ns]')

Simply apply a function over each row in intervals:

intervals['value'] = intervals.apply(lambda row: s[(row['StartDate'] <= s.index) & (s.index < row['EndDate'])].mean(), axis=1)

s here is evenly spaced out but that doesn't change the outcome. The date intervals are open on the right (the < sign). You can change that depending on your requirements.

Upvotes: 1

Related Questions