Reputation: 845
For a current project, I am planning to filter a JSON file by timeranges by running several loops, each time with a slightly shifted range. The code below however yields the error TypeError: Invalid comparison between dtype=datetime64[ns] and date
for line after_start_date = df["Date"] >= start_date
.
I have already tried to modify the formatting of the dates both within the Python code as well as the corresponding JSON file. Is there any smart tweak to align the date types/formats?
The JSON file has the following format:
[
{"No":"121","Stock Symbol":"A","Date":"05/11/2017","Text Main":"Sample text"}
]
And the corresponding code looks like this:
import string
import json
import pandas as pd
import datetime
from dateutil.relativedelta import *
# Loading and reading dataset
file = open("Glassdoor_A.json", "r")
data = json.load(file)
df = pd.json_normalize(data)
df['Date'] = pd.to_datetime(df['Date'])
# Create an empty dictionary
d = dict()
# Filtering by date
start_date = datetime.date.fromisoformat('2017-01-01')
end_date = datetime.date.fromisoformat('2017-01-31')
for i in df.iterrows():
start_date += relativedelta(months=+3)
end_date += relativedelta(months=+3)
print(start_date)
print(end_date)
after_start_date = df["Date"] >= start_date
before_end_date = df["Date"] <= end_date
between_two_dates = after_start_date & before_end_date
filtered_dates = df.loc[between_two_dates]
print(filtered_dates)
Upvotes: 13
Views: 31467
Reputation: 860
You can compare your dates using the following method
from datetime import datetime
df_subset = df.loc[(df['Start_Date'] > datetime.strptime('2018-12-31', '%Y-%m-%d'))]
Upvotes: 0
Reputation: 31011
My general advice is not to use datetime module. Use rather built-in pandasonic methods / classes like pd.to_datetime and pd.DateOffset.
You should also close the input file as early as it is not needed, e.g.:
with open('Glassdoor_A.json', 'r') as file:
data = json.load(file)
Other weird points in your code are that:
for i in df.iterrows():
,
but never use i
(control variable of this loop).Below you have an example of code to look for rows in consecutive months, up to some final date and print rows from the current month if any:
start_date = pd.to_datetime('2017-01-01')
end_date = pd.to_datetime('2017-03-31')
last_end_date = pd.to_datetime('2017-12-31')
mnthBeg = pd.offsets.MonthBegin(3)
mnthEnd = pd.offsets.MonthEnd(3)
while end_date <= last_end_date:
filtered_rows = df[df.Date.between(start_date, end_date)]
n = len(filtered_rows.index)
print(f'Date range: {start_date.strftime("%Y-%m-%d")} - {end_date.strftime("%Y-%m-%d")}, {n} rows.')
if n > 0:
print(filtered_rows)
start_date += mnthBeg
end_date += mnthEnd
Upvotes: 2
Reputation: 709
You can use pd.to_datetime('2017-01-31')
instead of datetime.date.fromisoformat('2017-01-31')
.
I hope this helps!
Upvotes: 14