Reputation: 195
Having trouble with a few errors. I'm trying to filter out data if two conditions are met.
import pyodbc
import pandas as pd
import datetime
from dateutil.relativedelta import relativedelta
effdate = datetime.date(2018,12,31)
conn = pyodbc.connect(
r'DRIVER={ODBC Driver 13 for SQL Server};'
r'SERVER=server;'
r'DATABASE=database;'
r'Trusted_Connection=yes;'
)
strSQL = "" # here is a SQL query which pulls many columns, including SaleDate, which is date format, and CategoryName, which contains text
df_auction = pd.read_sql(strSQL, conn)
priordate_rt = effdate + relativedelta(months=-6)
priordate_rt = pd.Timestamp(priordate_rt)
df_auction['SaleDateAdj'] = pd.to_datetime(df_auction['SaleDate'], format='%Y-%m-%d')
df_auction = df_auction[~((df_auction['CategoryName']=='Cars') & (df_auction['SaleDateAdj']<priordate_rt))]
TypeError: '<' not supported between instances of 'str' and 'int'
I can tell you that this works by itself:
df_test = df_auction[(df_auction['SaleDateAdj']<priordate_rt)]
This line by itself gives me ValueError: cannot reindex from a duplicate axis.
df_test = df_auction[(df_auction['CategoryName']=='Cars')]
Upvotes: 0
Views: 36
Reputation: 1596
Try doing
import pandas as pd
df_test['SaleDate'] = pd.to_datetime(df_test['SaleDate'])
And then performing the comparisson to filter out.
Upvotes: 1