Meiiso
Meiiso

Reputation: 399

Drop datetimes not within certain range from index

I have a DataFrame like this:

Date                 X
....
2014-01-02 07:00:00 16
2014-01-02 07:15:00 20
2014-01-02 07:30:00 21
2014-01-02 07:45:00 33
2014-01-02 08:00:00 22
....
2014-01-02 23:45:00 0
....

1) So my "Date" Column is a datetime and has values vor every 15min of a day.

What i want is to remove ALL Rows where the time is NOT between 08:00 and 18:00 o'clock.

2) Some days are missing in the datas...how could i put the missing days in my dataframe and fill them with the value 0 as X.

My approach: Create a new Series between two Dates and set 15min as frequenz and concat my X Column with the new created Series. Is that right?


Edit: Problem for my second Question:

#create new full DF without missing dates and reindex
full_range = pandas.date_range(start='2014-01-02', end='2017-11-
14',freq='15min') 
df = df.reindex(full_range,fill_value=0)

df.head()

Output:

                    Date        X
2014-01-02 00:00:00 1970-01-01  0
2014-01-02 00:15:00 1970-01-01  0
2014-01-02 00:30:00 1970-01-01  0
2014-01-02 00:45:00 1970-01-01  0
2014-01-02 01:00:00 1970-01-01  0

That didnt work as you see.

The "Date" Column is not a index btw. i need it as Column in my df

and why did he take "1970-01-01"? 1970 as year makes no sense to me

Upvotes: 0

Views: 281

Answers (1)

Brad Solomon
Brad Solomon

Reputation: 40878

What I want is to remove ALL Rows where the time is NOT between 08:00 and 18:00 o'clock.

Create a mask with datetime.time. Example:

from datetime import time

idx = pd.date_range('2014-01-02', freq='15min', periods=10000)
df = pd.DataFrame({'x': np.empty(idx.shape[0])}, index=idx)
t1 = time(8); t2 = time(18)
times = df.index.time
mask = (times > t1) & (times < t2)
df = df.loc[mask]

Some days are missing in the data...how could I put the missing days in my DataFrame and fill them with the value 0 as X?

  1. Build a date range that doesn't have missing data with pd.date_range() (see above).
  2. Call reindex() on df and specify fill_value=0.

Answering your questions in comments:

  • np.empty creates an empty array. I was just using it to build some "example" data that is basically garbage. Here idx.shape is the shape of your index (length, width), a tuple. So np.empty(idx.shape[0]) creates an empty 1d array with the same length as idx.
  • times = df.index.time creates a variable (a NumPy array) called times. df.index.time is the time for each element in the index of df. You can explore this yourself by just breaking the code down in pieces and experimenting with it on your own.

Upvotes: 2

Related Questions