Reputation: 479
Is there a way to do a merge in pandas limiting the columns you want to see?
What I have:
df1
ID Col1 Col2 Col3 Col4
1 1 1 1 D
2 A C C 4
3 B B B d
4 X 2 3 6
df2
ID ColA ColB ColC ColD
1 1 1 1 D
2 A C X 4
3 B B Y d
What I want:
df_final
ID ColA ColB ColC ColD
1 NA NA NA NA
2 A C X 4
3 B B Y d
4 NA NA NA NA
I want to do a left join on two dataframes (keeping all IDs from df1) but I only want to keep the columns from df2. I also only want values if Col3 from df1 is either C or B.
The following works but the resulting df includes all columns from both dfs. I can add a third line to only see the columns I want but this is a simple example. In reality I have much larger datasets and its difficult to manually input all the column names I want to keep.
df=pd.merge(df1,df2,how='left',on='ID')
df_final=df[df['Col3'].isin['C','B']]
Equivalent SQL would be
create table df_final as
select b.*
from df1 a
left join df2 b
on a.ID=b.ID
where a.Col3 in ('C','B')
Upvotes: 3
Views: 5990
Reputation: 370
This should do the trick
df=pd.merge(df1[df1.Col3.isin(['C','B'])][['ID']], df2, how='left', on='ID')
Upvotes: 0
Reputation: 402263
Mask df1
with your isin
condition before the merge
:
df1.where(df1.Col3.isin(['C', 'B']))[['ID']].merge(df2, how='left', on='ID')
Or,
df1.mask(~df1.Col3.isin(['C', 'B']))[['ID']].merge(df2, how='left', on='ID')
ID ColA ColB ColC ColD
0 NaN NaN NaN NaN NaN
1 2 A C X 4
2 3 B B Y d
3 NaN NaN NaN NaN NaN
Upvotes: 3