Reputation: 45
I'm trying to filter a pandas column based on the date in one of my dataframe columns. So for example I have one column called 'Date', that has been converted to datetime using
df['Date'] = pd.to_datetime(df['Date'])
Placing it in the format 2019-06-01 for example. Now I can perform a filter on the column, so if I wanted to get dates only in June I could do
df[(df['Date'] >= '2019-06-01') & (df['Date'] <= '2019-06-30')]
And this works just fine, comparing the datetime to a string, which I assume pandas converts to a datetime automatically to perform the comparison.
However, this stops working as soon as I assign the comparison string to a variable, so if I do this
start = '2019-06-01'
end = '2019-06-30'
df[(df['Date'] >= start) & (df['Date'] <= end)]
I get an error: TypeError: Invalid comparison between dtype=datetime64[ns] and str
Any ideas on why this may be occurring?
Upvotes: 0
Views: 892
Reputation: 30991
I use Pandas version 0.25 and Python version 3.7.0.
I checked your code:
start = '2019-06-01'
end = '2019-06-30'
df[(df['Date'] >= start) & (df['Date'] <= end)]
getting proper result (no error).
If you use some older version of either Python or Pandas, consider upgrading them.
I checked also other variants of code:
Conversion of "border" values to datetime:
d1 = pd.to_datetime('2019-06-01')
d2 = pd.to_datetime('2019-06-30')
df[df.Date.between(d1, d2)]
Usage of between with both arguments as strings:
df[df.Date.between('2019-06-01', '2019-06-30')]
getting also proper result. Check them on your installation as it is now and after upgrade (if you decide to do it).
Upvotes: 1