Nora
Nora

Reputation: 11

Python Pandas - Rename and merge of 3 datasets

I want to change the common variables' name for the 3 datasets and then merge them but it seems that the rename doesn't change var name as EID, how can I solve it?

Also, can I merge 3 datasets using pd.merge command, instead of doing 1:1 for all?

Thanks

data1516 = pd.read_csv('C:/data2015_2016.csv', sep='|', names=None, header=1, encoding='latin-1')    
data1617 = pd.read_csv('C:/data2016_2017.csv', sep='|', names=None, header=1, encoding='latin-1')    
data1718 = pd.read_csv('C:/data2017_2018.csv', sep='|', names=None, header=1, encoding='latin-1')

data1516.rename(index=str, columns={"Employer: ID" : "EID"})    
data1617.rename(index=str, columns={"Employer: ID" : "EID"})    
data1718.rename(index=str, columns={"Employer: ID" : "EID"})    
data1517 = pd.merge(data1516, data1617, on='EID', how='outer')

Upvotes: 1

Views: 592

Answers (2)

BENY
BENY

Reputation: 323356

by using reduce

data1516=data1516.rename(columns={"Employer: ID" : "EID"})    
data1617=data1617.rename(columns={"Employer: ID" : "EID"})    
data1718=data1718.rename(columns={"Employer: ID" : "EID"})  

l=[data1516,data1617,data1718]
import functools 
df=functools.reduce(lambda x, y: pd.merge(x, y, on = 'EID'), l)

Upvotes: 1

MaxU - stand with Ukraine
MaxU - stand with Ukraine

Reputation: 210932

This should do the trick:

dfs = [data1516, data1617, data1718]
df = pd.concat([x.rename(columns={"Employer: ID" : "EID"}) for x in dfs], axis=1)

Upvotes: 1

Related Questions