Reputation: 599
With an unevenly spaced datetime Series like so:
date
2019-02-20 13:00:49.268 41.177929
2019-02-20 13:00:50.275 12.431984
2019-02-20 13:00:51.397 18.042411
2019-02-20 13:00:52.434 13.144179
2019-02-20 13:00:53.542 21.349083
...
2019-02-20 13:05:55.059 51.763360
2019-02-20 13:05:56.169 58.140644
2019-02-20 13:05:57.279 0.411533
2019-02-20 13:05:58.408 48.404780
2019-02-20 13:05:59.518 14.626680
Name: Values, Length: 285, dtype: float64
And with a provided list of datetime intervals (no overlapping) formatted like: [(start1, end1), (start2, end2), ...]
What would be a short way to get the mean value of the Series in the combined intervals? (any interpolation function can be used here)
I'm a pandas beginner and didn't find anything here or in Pandas docs, sorry if this is a dumb question...
Upvotes: 4
Views: 1579
Reputation: 30971
Let's define the test source Series as:
2019-02-20 13:00:49.268 40
2019-02-20 13:00:50.275 30
2019-02-20 13:02:51.397 18
2019-02-20 13:02:52.434 13
2019-02-20 13:05:53.542 21
2019-02-20 13:05:55.059 51
2019-02-20 13:06:56.169 32
2019-02-20 13:06:57.279 38
2019-02-20 13:08:58.408 48
2019-02-20 13:08:59.518 14
Name: Val, dtype: int64
and the list of intervals as:
intv = [(pd.to_datetime('2019-02-20 13:00'), pd.to_datetime('2019-02-20 13:01')),
(pd.to_datetime('2019-02-20 13:06'), pd.to_datetime('2019-02-20 13:07'))]
A preparatory step is to create an IntervalIndex:
intvInd = pd.IntervalIndex.from_tuples(intv)
Then the mean you want can be computed as:
s[[intvInd.contains(v) for v in s.index.values]].mean()
Note
I tried the above solution using Pandas version 0.24.2. As Inspi noticed, at least in version 0.25 the last instruction must be changed to:
s[[any(intvInd.contains(v)) for v in s.index.values]].mean()
Apparently, version 0.24 adds this any(...) somewhere "under the hood". In version 0.25 this behavior does not take place.
Upvotes: 3
Reputation: 93161
Assume your time series is indexed by date:
dates = pd.date_range('2019-07-01', '2019-07-25', freq='T')
s = pd.Series(np.random.uniform(1, 100, len(dates)), index=dates)
Some sample data:
2019-07-01 00:00:00 54.851538
2019-07-01 00:01:00 82.493677
2019-07-01 00:02:00 80.589765
2019-07-01 00:03:00 54.973948
2019-07-01 00:04:00 18.216064
And your intervals are defined in a data frame:
intervals = pd.DataFrame([
['2019-07-01', '2019-07-02'],
['2019-07-02', '2019-07-10']
], columns=['StartDate', 'EndDate'], dtype='datetime64[ns]')
Simply apply a function over each row in intervals
:
intervals['value'] = intervals.apply(lambda row: s[(row['StartDate'] <= s.index) & (s.index < row['EndDate'])].mean(), axis=1)
s
here is evenly spaced out but that doesn't change the outcome. The date intervals are open on the right (the <
sign). You can change that depending on your requirements.
Upvotes: 1