Jane Alice
Jane Alice

Reputation: 91

Select only certain columns from Merged dataframe using Python Pandas

I have two dataframes (excel sheets) and I want to select only certain columns from a resulting pd.merge

df1 = pd.read_excel('Receivables.xlsx')
#the sheet looks like the table below
ID Dollar Type Q1 Q2 Q3 Q4
1 Receivables 10 20 30 44
2 Receivables 11 21 41 54
df2 = pd.read_excel('Payables.xlsx')
#the sheet looks like the table below
Payable_ID Dollar Type Q1 Q2 Q3 Q4
1 Payables 10 -20 -30 -44
2 Payables 11 -21 -41 -54

My merge then looks like this

leftJoin = df1.merge(df2, left_on=['ID'], right_on=['Payable_ID'], how='left')

How do I only select columns Dollar Type, Q1, Q2 , Q3 , Q4 from df1 and select Payable_ID, Q1, Q2 from df2. The problem I'm running into is that the duplicate columns are appended with either _x or _y. While I get why it's doing that, is there a way to turn off this appending?

In SQL, there's no need for me to specify, for example, df1.[Dollar Type_x] or df2.[Q1_y]

SELECT DF1.[DOLLAR TYPE],
   DF1.[Q1],
   DF1.[Q2],
   DF1.[Q3],
   DF1.[Q4],
   DF2.[PAYABLE_ID],
   DF2.[Q1],
   DF2.[Q2]
FROM DF1 LEFT JOIN DF2 ON DF1.ID = DF.PAYABLE_ID

Upvotes: 1

Views: 854

Answers (1)

cout112
cout112

Reputation: 26

You can pass only selected columns of the DataFrames to merge:

leftJoin = df1[['ID','Dollar Type','Q1','Q2','Q3','Q4']].merge(df2[['Payable_ID','Q1','Q2']], left_on=['ID'], right_on=['Payable_ID'], how='left')

Now, you have undesired 'ID' column in the merged DataFrame which you can drop using

leftJoin.drop(columns=['ID'], inplace=True)

You can also make the drop inline with the merge:

leftJoin = df1[['ID','Dollar Type','Q1','Q2','Q3','Q4']].merge(df2[['Payable_ID','Q1','Q2']], left_on=['ID'], right_on=['Payable_ID'], how='left').drop(columns=['ID'], inplace=True)

To change the names of the columns, you can use rename:

leftJoin.rename(columns={'Q1_x':'Q1', 'Q2_x':'Q2','Q1_y':'Q5','Q2_y':'Q6'}, inplace=True)

Which again can be done in the same line:

leftJoin = df1[['ID','Dollar Type','Q1','Q2','Q3','Q4']].merge(df2[['Payable_ID','Q1','Q2']], left_on=['ID'], right_on=['Payable_ID'], how='left').drop(columns=['ID'], inplace=True).rename(columns={'Q1_x':'Q1', 'Q2_x':'Q2','Q1_y':'Q5','Q2_y':'Q6'}, inplace=True)

Upvotes: 1

Related Questions