Reputation: 5228
I have the below pandas dataframe
that I am trying to filter, to provide me with an updated dataframe
based on the last date & time
in my database.
This is a sample of the dataframe
I am trying to filter:
>>> df
# The time is in '%H:%M:%S' format, and the date is in '%d-%b-%Y'
Time Date Items
00:05:00 29-May-2018 foo
00:06:00 30-May-2018 barr
00:07:00 31-May-2018 gaaa
00:11:00 31-May-2018 raaa
... ... ...
What I am trying to do, is to filter this dataframe
based on the last entry in my sql database. For example, the last entry is: ['20:05:00','30-May-2018']
. The below code is what I used to filter out from df
:
last_entry = ['20:05:00','30-May-2018']
# Putting time into a datetime format to work within the dataframe.
last_entry_time = datetime.strptime(last_entry[0], '%H:%M:%S').time()
new_entry = df[(df['Date'] >= last_entry[1]) & (df['Time'] > last_entry_time)]
If I were to just have the filter as: new_entry = df[df['Date'] >= last_entry[1])]
instead, this works well to return the current date and newer based on the last date, which is: 30-May-2018
and 31-May-2018
.
However, regarding the time portion, because my last_entry
time is 20:05:00
, it starts to filter out the rest of the data that i'm trying to collect...
Question:
How can I perform the filter of the dataframe
, such that it returns me the new entries in the dataframe, that is based off the old date and time
in the database?
Ideal result
last_entry = ['20:05:00','30-May-2018']
>>> new_entry
Time Date Items
00:07:00 31-May-2018 gaaa
00:11:00 31-May-2018 raaa
... ... ...
Upvotes: 0
Views: 100
Reputation: 4660
One option is to create a datetime column in your DataFrame, and then filter on this column, for example:
df["real_date"] = pd.to_datetime(df["Date"], format="%d-%b-%Y")
df["real_time"] = pd.to_timedelta(df["Time"])
df["real_datetime"] = df["real_date"] + df["real_time"]
You also need to convert your last_entry
variable to a proper datetime, for example like this:
from dateutil.parser import parse
from datetime import datetime
date = parse(last_entry[1], dayfirst=True)
time_elements = [int(t) for t in last_entry[0].split(":")]
last_entry_dt = datetime(date.year, date.month, date.day, time_elements[0], time_elements[1], time_elements[2])
Then you can filter the new DataFrame column like so:
df[df["real_datetime"] >= last_entry_dt]
Upvotes: 1