nrlakin
nrlakin

Reputation: 5584

How can I get all entries in a DateTimeIndexed pandas series that occur in a list of days?

I have a series of hourly data, and a Python list of dates that I'm interested in examining:

>>> hourly
KWH_DTTM
2015-06-20 15:00:00    2138.4
2015-06-20 16:00:00    4284.0
2015-06-20 17:00:00    4168.8
                    ...  
2017-06-21 21:00:00    2743.2
2017-06-21 22:00:00    2757.6
2017-06-21 23:00:00    2635.2
Freq: H, Name: KWH, Length: 17577, dtype: float64

>>> days
[datetime.date(2017, 5, 5), datetime.date(2017, 5, 8), datetime.date(2017, 5, 9), datetime.date(2017, 6, 2)]

I am trying to figure out how to select all entries from hourly that land on a day in days (days is about 50 entries long, and dates can be arbitrary). days is currently a list of Python date objects, but I don't care if they're strings, etc.

If I index hourly with days, I get a series that has been resampled to daily intervals:

>>> hourly[days]
KWH_DTTM
2017-05-05    2628.0
2017-05-08    2628.0
2017-05-09    2548.8
2017-06-02    2512.8
Name: KWH, Length: 30, dtype: float64

If I index with a single day, rendered to a string, I get the desired output for that day:

>>> hourly['2017-5-5']
KWH_DTTM
2017-05-05 00:00:00    2505.6
2017-05-05 01:00:00    2563.2
2017-05-05 02:00:00    2505.6
                    ...  
2017-05-05 21:00:00    2268.0
2017-05-05 22:00:00    2232.0
2017-05-05 23:00:00    2088.0
Freq: H, Name: KWH, Length: 24, dtype: float64

Is there a way to do this besides looping over my list of days and concatenating the results?

Upvotes: 0

Views: 52

Answers (2)

Parfait
Parfait

Reputation: 107587

Consider building a boolean series built from a Series.apply() passing every datetimeindex value and checking if it equals each element of dates via a list comprehension. Then use this boolean series to filter hourly series.

# DATA EXAMPLE
np.random.seed(45)
hourly = pd.Series(index=pd.DatetimeIndex(start='2016-09-05 00:00:00',
                                          periods=17577, freq='H'),
                   data=np.random.randn(17577),
                   name='KWH_DTTM') 

days = [datetime.date(2017, 5, 5), datetime.date(2017, 5, 8), 
        datetime.date(2017, 5, 9), datetime.date(2017, 6, 2)]

# BOOLEAN SERIES
bools = pd.Series(hourly.index.values).apply(lambda x: \
                                       max([x.date() == d for d in days]))
bools.index = hourly.index

# FILTER ORIGINAL SERIES
newhourly = hourly[bools]

print(newhourly.head(10))
# 2017-05-05 00:00:00   -0.238799
# 2017-05-05 01:00:00   -0.263365
# 2017-05-05 02:00:00   -0.249632
# 2017-05-05 03:00:00    0.131630
# 2017-05-05 04:00:00   -1.279383
# 2017-05-05 05:00:00    0.411316
# 2017-05-05 06:00:00   -2.059022
# 2017-05-05 07:00:00   -1.008058
# 2017-05-05 08:00:00   -0.365651
# 2017-05-05 09:00:00    1.515522
# Name: KWH_DTTM, dtype: float64

print(newhourly.tail(10))
# 2017-06-02 14:00:00    0.329567
# 2017-06-02 15:00:00   -0.618604
# 2017-06-02 16:00:00    0.848719
# 2017-06-02 17:00:00   -1.152657
# 2017-06-02 18:00:00    0.269618
# 2017-06-02 19:00:00   -1.806861
# 2017-06-02 20:00:00   -0.188643
# 2017-06-02 21:00:00    0.515790
# 2017-06-02 22:00:00    0.384695
# 2017-06-02 23:00:00    1.115494
# Name: KWH_DTTM, dtype: float64

Upvotes: 1

NickBraunagel
NickBraunagel

Reputation: 1599

You could convert hourly to a DataFrame, and then use .isin():

df = hourly.reset_index(name='KWH').rename(columns={'index':'hours'})
df = df[df.hours.apply(lambda x: datetime.date(x.year, x.month, x.day)).isin(dates)]

Here's the complete code with random data:

import pandas as pd
import datetime
import random

random_data = [random.randint(1000,2000) for x in range(1,1000)]
hours = [datetime.datetime(random.randint(2014,2016),random.randint(1,12),random.randint(1,28),random.randint(1,23),0) for x in range(1,1000)]

hourly = pd.Series(data=random_data, index=h)

dates = [datetime.date(random.randint(2014,2016),random.randint(1,12),random.randint(1,28)) for x in range(1,10)]

df = hourly.reset_index(name='KWH').rename(columns={'index':'hours'})
df = df[df.hours.apply(lambda x: datetime.date(x.year, x.month, x.day)).isin(dates)]

Upvotes: 1

Related Questions