Malte Susen
Malte Susen

Reputation: 845

Python: TypeError: Invalid comparison between dtype=datetime64[ns] and date

For a current project, I am planning to filter a JSON file by timeranges by running several loops, each time with a slightly shifted range. The code below however yields the error TypeError: Invalid comparison between dtype=datetime64[ns] and date for line after_start_date = df["Date"] >= start_date.

I have already tried to modify the formatting of the dates both within the Python code as well as the corresponding JSON file. Is there any smart tweak to align the date types/formats?

The JSON file has the following format:

[
{"No":"121","Stock Symbol":"A","Date":"05/11/2017","Text Main":"Sample text"}
]

And the corresponding code looks like this:

import string
import json

import pandas as pd
import datetime
from dateutil.relativedelta import *


# Loading and reading dataset
file = open("Glassdoor_A.json", "r")
data = json.load(file)
df = pd.json_normalize(data)
df['Date'] = pd.to_datetime(df['Date'])


# Create an empty dictionary
d = dict()

# Filtering by date

start_date = datetime.date.fromisoformat('2017-01-01')
end_date = datetime.date.fromisoformat('2017-01-31')

for i in df.iterrows():
    start_date += relativedelta(months=+3)
    end_date += relativedelta(months=+3)

    print(start_date)
    print(end_date)

    after_start_date = df["Date"] >= start_date
    before_end_date = df["Date"] <= end_date

    between_two_dates = after_start_date & before_end_date
    filtered_dates = df.loc[between_two_dates]

    print(filtered_dates)

Upvotes: 13

Views: 31467

Answers (3)

Terminator17
Terminator17

Reputation: 860

You can compare your dates using the following method

from datetime import datetime
df_subset = df.loc[(df['Start_Date'] > datetime.strptime('2018-12-31', '%Y-%m-%d'))]

Upvotes: 0

Valdi_Bo
Valdi_Bo

Reputation: 31011

My general advice is not to use datetime module. Use rather built-in pandasonic methods / classes like pd.to_datetime and pd.DateOffset.

You should also close the input file as early as it is not needed, e.g.:

with open('Glassdoor_A.json', 'r') as file:
    data = json.load(file)

Other weird points in your code are that:

  • you wrote a loop iterating over rows for i in df.iterrows():, but never use i (control variable of this loop).
  • your loop works rather in a time step (not "row by row") mode, so your loop should be rather something like "while end_date <= last_end_date:",
  • the difference between start_date and end_date is just 1 month (actually they are dates of start and end of some month), but in the loop you increase both dates by 3 months.

Below you have an example of code to look for rows in consecutive months, up to some final date and print rows from the current month if any:

start_date = pd.to_datetime('2017-01-01')
end_date = pd.to_datetime('2017-03-31')
last_end_date = pd.to_datetime('2017-12-31')
mnthBeg = pd.offsets.MonthBegin(3)
mnthEnd = pd.offsets.MonthEnd(3)
while end_date <= last_end_date:
    filtered_rows = df[df.Date.between(start_date, end_date)]
    n = len(filtered_rows.index)
    print(f'Date range: {start_date.strftime("%Y-%m-%d")} - {end_date.strftime("%Y-%m-%d")},  {n} rows.')
    if n > 0:
        print(filtered_rows)
    start_date += mnthBeg
    end_date += mnthEnd

Upvotes: 2

kampmani
kampmani

Reputation: 709

You can use pd.to_datetime('2017-01-31') instead of datetime.date.fromisoformat('2017-01-31').

I hope this helps!

Upvotes: 14

Related Questions