Reputation: 11
I want to change the common variables' name for the 3 datasets and then merge them but it seems that the rename doesn't change var name as EID, how can I solve it?
Also, can I merge 3 datasets using pd.merge command, instead of doing 1:1 for all?
Thanks
data1516 = pd.read_csv('C:/data2015_2016.csv', sep='|', names=None, header=1, encoding='latin-1')
data1617 = pd.read_csv('C:/data2016_2017.csv', sep='|', names=None, header=1, encoding='latin-1')
data1718 = pd.read_csv('C:/data2017_2018.csv', sep='|', names=None, header=1, encoding='latin-1')
data1516.rename(index=str, columns={"Employer: ID" : "EID"})
data1617.rename(index=str, columns={"Employer: ID" : "EID"})
data1718.rename(index=str, columns={"Employer: ID" : "EID"})
data1517 = pd.merge(data1516, data1617, on='EID', how='outer')
Upvotes: 1
Views: 592
Reputation: 323356
by using reduce
data1516=data1516.rename(columns={"Employer: ID" : "EID"})
data1617=data1617.rename(columns={"Employer: ID" : "EID"})
data1718=data1718.rename(columns={"Employer: ID" : "EID"})
l=[data1516,data1617,data1718]
import functools
df=functools.reduce(lambda x, y: pd.merge(x, y, on = 'EID'), l)
Upvotes: 1
Reputation: 210932
This should do the trick:
dfs = [data1516, data1617, data1718]
df = pd.concat([x.rename(columns={"Employer: ID" : "EID"}) for x in dfs], axis=1)
Upvotes: 1