George Adams
George Adams

Reputation: 341

How to solve a Pandas Merge Error: key must be integer or timestamp?

I'm trying to merge to pandas dataframes, one is called DAILY and the other SF1.

DAILY csv:

ticker,date,lastupdated,ev,evebit,evebitda,marketcap,pb,pe,ps
A,2020-09-14,2020-09-14,31617.1,36.3,26.8,30652.1,6.2,44.4,5.9

SF1 csv (not sure why ticker is indented, just ignore that):

    ticker,dimension,calendardate,datekey,reportperiod,lastupdated,accoci,assets,assetsavg,assetsc,assetsnc,assetturnover,bvps,capex,cashneq,cashnequsd,cor,consolinc,currentratio,de,debt,debtc,debtnc,debtusd,deferredrev,depamor,deposits,divyield,dps,ebit,ebitda,ebitdamargin,ebitdausd,ebitusd,ebt,eps,epsdil,epsusd,equity,equityavg,equityusd,ev,evebit,evebitda,fcf,fcfps,fxusd,gp,grossmargin,intangibles,intexp,invcap,invcapavg,inventory,investments,investmentsc,investmentsnc,liabilities,liabilitiesc,liabilitiesnc,marketcap,ncf,ncfbus,ncfcommon,ncfdebt,ncfdiv,ncff,ncfi,ncfinv,ncfo,ncfx,netinc,netinccmn,netinccmnusd,netincdis,netincnci,netmargin,opex,opinc,payables,payoutratio,pb,pe,pe1,ppnenet,prefdivis,price,ps,ps1,receivables,retearn,revenue,revenueusd,rnd,roa,roe,roic,ros,sbcomp,sgna,sharefactor,sharesbas,shareswa,shareswadil,sps,tangibles,taxassets,taxexp,taxliabilities,tbvps,workingcapital
A,ARQ,2020-09-14,2020-09-14,2020-09-14,2020-09-14,53000000,7107000000,,4982000000,2125000000,,10.219,-30000000,1368000000,1368000000,1160000000,131000000,2.41,0.584,665000000,111000000,554000000,665000000,281000000,96000000,0,0.0,0.0,202000000,298000000,0.133,298000000,202000000,202000000,0.3,0.3,0.3,4486000000,,4486000000,50960600000,,,354000000,0.806,1.0,1086000000,0.484,0,0,4337000000,,1567000000,42000000,42000000,0,2621000000,2067000000,554000000,51663600000,1368000000,-160000000,2068000000,111000000,0,1192000000,-208000000,-42000000,384000000,0,131000000,131000000,131000000,0,0,0.058,915000000,171000000,635000000,0.0,11.517,,,1408000000,0,114.3,,,1445000000,131000000,2246000000,2246000000,290000000,,,,,0,625000000,1.0,452000000,439000000,440000000,5.116,7107000000,0,71000000,113000000,16.189,2915000000

Datasorting/cleaning code:

sf1 = sf1.drop(columns=['number','dimension', 'datekey', 'reportperiod','lastupdated', 'ev', 'evebit', 'evebitda', 'marketcap', 'pb', 'pe', 'ps'])

daily = daily.sort_values('date', ascending=True)
sf1 = sf1.sort_values('calendardate', ascending=True)
daily = daily.sort_values('ticker')
sf1 = sf1.sort_values('ticker')

Code to merge the dataframes:

df = pd.merge_asof(daily, sf1, by = 'ticker', left_on='date', right_on='calendardate', tolerance=pd.Timedelta(value=100, unit='D'), direction='backward')

I think what may be causing the error is that the dataframes are being merged by the ticker column. I'm not sure if that has to be an int or in a dateformat, or any specific format. Currently it is just the ticker of companies like shown above.

The dataframes are being merged on the date column in the DAILY csv and the calendardate column in the SF1 csv.

If someone could also make the distinction between what happens when merging by and how that changes if you were to only have left_on and right_on.

Upvotes: 0

Views: 154

Answers (1)

Anurag Dabas
Anurag Dabas

Reputation: 24314

You are facing this problem because your date column in 'daily' and calendardate column in 'sf1' are of type object i.e string

Just change their type to datatime by pd.to_datetime() method

so just add these 2 lines of code in your Datasorting/cleaning code:-

daily['date']=pd.to_datetime(daily['date'])
sf1['calendardate']=pd.to_datetime(sf1['calendardate'])

Now write:-

df = pd.merge_asof(daily, sf1, by = 'ticker', left_on='date', right_on='calendardate', tolerance=pd.Timedelta(value=100, unit='D'), direction='backward')

Upvotes: 1

Related Questions