babz
babz

Reputation: 479

merge in pandas and output only selected columns

Is there a way to do a merge in pandas limiting the columns you want to see?

What I have:

df1

ID Col1 Col2 Col3 Col4
1   1    1    1    D
2   A    C    C    4
3   B    B    B    d
4   X    2    3    6

df2

ID ColA ColB ColC ColD
1   1    1    1    D
2   A    C    X    4
3   B    B    Y    d

What I want:

df_final

ID ColA ColB ColC ColD
1   NA   NA   NA   NA
2   A    C    X    4
3   B    B    Y    d
4   NA   NA   NA   NA

I want to do a left join on two dataframes (keeping all IDs from df1) but I only want to keep the columns from df2. I also only want values if Col3 from df1 is either C or B.

The following works but the resulting df includes all columns from both dfs. I can add a third line to only see the columns I want but this is a simple example. In reality I have much larger datasets and its difficult to manually input all the column names I want to keep.

df=pd.merge(df1,df2,how='left',on='ID')
df_final=df[df['Col3'].isin['C','B']]

Equivalent SQL would be

create table df_final as 
select b.*
from df1 a
left join df2 b
on a.ID=b.ID
where a.Col3 in ('C','B')

Upvotes: 3

Views: 5990

Answers (2)

Neel
Neel

Reputation: 370

This should do the trick

df=pd.merge(df1[df1.Col3.isin(['C','B'])][['ID']], df2, how='left', on='ID')

Upvotes: 0

cs95
cs95

Reputation: 402263

Mask df1 with your isin condition before the merge:

df1.where(df1.Col3.isin(['C', 'B']))[['ID']].merge(df2, how='left', on='ID')

Or,

df1.mask(~df1.Col3.isin(['C', 'B']))[['ID']].merge(df2, how='left', on='ID')

    ID ColA ColB ColC ColD
0  NaN  NaN  NaN  NaN  NaN
1    2    A    C    X    4
2    3    B    B    Y    d
3  NaN  NaN  NaN  NaN  NaN

Upvotes: 3

Related Questions