arnyeinstein
arnyeinstein

Reputation: 1013

Join produces nan

I want to merge tables using on the column "Technology". The merge results in nan instead of the values of the second table.

I tried converting the column being used to merge to strings, factors or integers, but the problem remains

import pandas as pd
multipliers = pd.read_csv("multipliers.csv")
multipliers.Technology.astype('str')
generators = pd.read_csv("generators.csv")
generators.Technology.astype('str')
mergeddf = generators.merge(multipliers, on = 'Technology', how = 'left')

generators.csv:

,idIntBus,idGen,idProfile,Technology,HedgeRatio,meanErrorForecast24h,sigmaErrorForecast24h,Pmax,Pmin,Qmax,Qmin,Emax,Emin,E_ini,VarCost,UnitType,CandidateUnit,Inc,FOC,IncE
0,1,1,15.0,wind,,0.0092,0.045,3000.0,0.0,480.0,-480.0,,,,0.0,NonDispatchable,1,100000.0,40000.0,
1,1,2,14.0,pv,,0.066,0.0,12000.0,0.0,480.0,-480.0,,,,0.0,NonDispatchable,1,100000.0,25000.0,

multipliers.csv:

,Technology,VC,FC
0,gas         ,1.0,1.0
1,wind        ,1.0,1.0
2,pv          ,1.0,1.0

I expect to find the columns of the multiplier table as floats not as nans.

Upvotes: 0

Views: 293

Answers (2)

Yash Shah
Yash Shah

Reputation: 141

Two things:

  1. astype() returns a dataframe and doesn't do it inplace, so change it to:

    multipliers['Technology'] = multipliers.Technology.astype('str')
    
  2. Use strip() as there are trailing spaces

    generators['Technology'] = generators['Technology'].apply(lambda x: x.strip())
    

So, your code will be:

import pandas as pd
multipliers = pd.read_csv("multipliers.csv")
multipliers['Technology'] = multipliers.Technology.astype('str')
generators = pd.read_csv("generators.csv")
generators['Technology'] = generators.Technology.astype('str')
generators['Technology'] = generators['Technology'].apply(lambda x: x.strip())
mergeddf = generators.merge(multipliers, on = 'Technology', how = 'left')

Upvotes: 1

jezrael
jezrael

Reputation: 863176

I think there should be some whitespaces or similar, convert both columns to list for test it:

print (multipliers.Technology.tolist())
print (generators.Technology.tolist())

So possible solution is Series.str.strip, if necessary convert values to strings before:

multipliers.Technology= multipliers.Technology.astype('str').str.strip()
generators.Technology= generators.Technology.astype('str').str.strip()

mergeddf = generators.merge(multipliers, on = 'Technology', how = 'left')

Also for test only matched values of columns is possible use default inner join:

print(generators.merge(multipliers, on = 'Technology'))

Upvotes: 1

Related Questions