Reputation: 59
I have to find difference between two date columns in python dataframe and to compare whether difference is greater than 120 or not
if working_data['CLAIMS_EVENT_DATE'] - working_data['LAST_LAPSED_DATE'] > 120:
I got below Error
invalid_comparison .format(dtype=left.dtype, typ=type(right).name))
TypeError: Invalid comparison between dtype=timedelta64[ns] and int
Upvotes: 0
Views: 79
Reputation: 66
#Convert both columns to datetime format
working_data['CLAIMS_EVENT_DATE'] = pd.to_datetime(working_data['CLAIMS_EVENT_DATE'])
working_data['LAST_LAPSED_DATE'] = pd.to_datetime(working_data['LAST_LAPSED_DATE'])
#Calculate the difference between the days
working_data['Days'] = (working_data['LAST_LAPSED_DATE']
- working_data['CLAIMS_EVENT_DATE']).days
#Create a column 'Greater' and check whether difference is greater than 120 or not
working_data.loc[working_data.Days <= 120, 'Greater'] = 'False'
working_data.loc[working_data.Days > 120, 'Greater'] = 'True'
Upvotes: 0
Reputation: 863741
If compare both get timedeltas, so for compare are possible 2 solutions - compare days by Series.dt.days
with Series.any
if need test if at least one value match condition:
s = (working_data['CLAIMS_EVENT_DATE'] - working_data['LAST_LAPSED_DATE'])
if (s.dt.days > 120).any():
print ('At least one value is higher')
Or compare by Timedelta
:
if (s > pd.Timedelta(120, unit='d')).any():
print ('At least one value is higher')
If need fitler rows use boolean indexing
:
df = working_data[s.dt.days > 120]
Or:
df = working_data[s > pd.Timedelta(120, unit='d')]
Upvotes: 1