Abhay kumar
Abhay kumar

Reputation: 225

TypeError: ("Cannot compare type 'Timestamp' with type 'str'", 'occurred at index 262224')

I am trying to create a flag for date from datetime column. but getting an error after applying the below function.

def f(r):
    if r['balance_dt'] <= '2016-11-30':
        return 0
    else:
        return 1
df_obctohdfc['balance_dt_flag'] = df_obctohdfc.apply(f,axis=1)

Upvotes: 1

Views: 5464

Answers (2)

jezrael
jezrael

Reputation: 862541

In pandas is best avoid loops, how working apply under the hood.

I think need convert string to datetime and then cast mask to integer - True to 1 and False to 0 and change <= to >:

timestamp = pd.to_datetime('2016-11-30')
df_obctohdfc['balance_dt_flag'] = (df_obctohdfc['balance_dt'] > timestamp).astype(int)

Sample:

rng = pd.date_range('2016-11-27', periods=10)
df_obctohdfc = pd.DataFrame({'balance_dt': rng})  
#print (df_obctohdfc)

timestamp = pd.to_datetime('2016-11-30')
df_obctohdfc['balance_dt_flag'] = (df_obctohdfc['balance_dt'] > timestamp).astype(int)
print (df_obctohdfc)

  balance_dt  balance_dt_flag
0 2016-11-27                0
1 2016-11-28                0
2 2016-11-29                0
3 2016-11-30                0
4 2016-12-01                1
5 2016-12-02                1
6 2016-12-03                1
7 2016-12-04                1
8 2016-12-05                1
9 2016-12-06                1

Comparing in 1000 rows DataFrame:

In [140]: %timeit df_obctohdfc['balance_dt_flag1'] = (df_obctohdfc['balance_dt'] > timestamp).astype(int)
1000 loops, best of 3: 368 µs per loop

In [141]: %timeit df_obctohdfc['balance_dt_flag2'] = df_obctohdfc.apply(f,axis=1)
10 loops, best of 3: 91.2 ms per loop

Setup:

rng = pd.date_range('2015-11-01', periods=1000)
df_obctohdfc = pd.DataFrame({'balance_dt': rng})  
#print (df_obctohdfc)

timestamp = pd.to_datetime('2016-11-30')

import datetime
def f(r):
    if r['balance_dt'] <= datetime.datetime.strptime('2016-11-30', '%Y-%m-%d'):
        return 0
    else:
        return 1

Upvotes: 1

Rakesh
Rakesh

Reputation: 82765

The error your are getting is because you are comparing string object to datetime object. You can convert the string to datetime.

Ex:

import datetime
def f(r):
    if r['balance_dt'] <= datetime.datetime.strptime('2016-11-30', '%Y-%m-%d'):
        return 0
    else:
        return 1
df_obctohdfc['balance_dt_flag'] = df_obctohdfc.apply(f,axis=1)

Note: It is better to do the way jezrael has mention. That is the right way to do it

Upvotes: 1

Related Questions