Reputation: 375
I've a dataframe containing six month error logs, collected every day. I want to retrieve the last 30 days records from the last date. Last date isn't today.
For example: I've data for the months May, June, July and until August 15
, I want to retrieve that data from August 15
to July 15
making it 30 days records.
Is there a way to do this in Python Pandas?
This is the sample dataframe:
Error_Description Date Weekend Type
N17739 Limit switch X- 5/1/2015 5/3/2015 Critical
N17739 Limit switch Y- 5/1/2015 5/3/2015 Critical
N938 Key non-functional 5/1/2015 5/3/2015 Non-Critical
P124 Magazine is running 5/1/2015 5/3/2015 Non-Critical
N17738 Limit switch Z+ 5/1/2015 5/3/2015 Critical
N938 Key non-functional 5/1/2015 5/3/2015 Non-Critical
... ... ... ...
P873 ENCLOSURE DOOR 8/24/2015 8/30/2015 Non-Critical
N3065 Reset M114 8/24/2015 8/30/2015 Non-Critical
N3065 Reset M114, 8/24/2015 8/30/2015 Non-Critical
N2853 Synchronization 8/24/2015 8/30/2015 Critical
P152 ENCLOSURE 8/24/2015 8/30/2015 Non-Critical
N6236 has stopped 8/24/2015 8/30/2015 Critical
Upvotes: 16
Views: 21158
Reputation: 475
The other two answers (currently) assume the date is the index, but in python3 at least, you can solve this with just simple masking (.query(..)
doesn't work).
df[df["Date"] >= (pd.to_datetime('8/24/2015') - pd.Timedelta(days=30))]
Upvotes: 6
Reputation: 862611
Date lastdayfrom
is used for selecting last 30 days of DataFrame
by function loc.
lastdayfrom = pd.to_datetime('8/24/2015')
print lastdayfrom
#2015-08-24 00:00:00
print df
# Error_Description Date Weekend Type
#0 N17739 Limit switch X- 2015-05-01 2015-05-03 Critical
#1 N17739 Limit switch Y- 2015-05-01 2015-05-03 Critical
#2 N938 Key non-functional 2015-05-01 2015-05-03 Non-Critical
#3 P124 Magazine is running 2015-05-01 2015-05-03 Non-Critical
#4 N17738 Limit switch Z+ 2015-02-01 2015-05-03 Critical
#5 N938 Key non-functional 2015-07-25 2015-05-03 Non-Critical
#6 P873 ENCLOSURE DOOR 2015-07-24 2015-08-30 Non-Critical
#7 N3065 Reset M114 2015-07-21 2015-08-21 Non-Critical
#8 N3065 Reset M114, 2015-08-22 2015-08-22 Non-Critical
#9 N2853 Synchronization 2015-08-23 2015-08-30 Critical
#10 P152 ENCLOSURE 2015-08-24 2015-08-30 Non-Critical
#11 N6236 has stopped 2015-08-24 2015-08-30 Critical
print df.dtypes
#Error_Description object
#Date datetime64[ns]
#Weekend datetime64[ns]
#Type object
#dtype: object
#set index from column Date
df = df.set_index('Date')
#if datetimeindex isn't order, order it
df= df.sort_index()
#last 30 days of date lastday
df = df.loc[lastdayfrom - pd.Timedelta(days=30):lastdayfrom].reset_index()
print df
# Date Error_Description Weekend Type
#0 2015-07-25 N3065 Reset M114 2015-08-21 Non-Critical
#1 2015-08-22 N3065 Reset M114, 2015-08-22 Non-Critical
#2 2015-08-23 N2853 Synchronization 2015-08-30 Critical
#3 2015-08-24 P152 ENCLOSURE 2015-08-30 Non-Critical
#4 2015-08-24 N6236 has stopped 2015-08-30 Critical
Upvotes: 13
Reputation: 334
You can use DataFrame.last_valid_index()
to find the label of the last line, and then subtract DateOffset(30, 'D')
to go back 30 days:
df[df.last_valid_index()-pandas.DateOffset(30, 'D'):]
Upvotes: 6