Reputation: 9658
I want to merge two dataframes like:
df1.columns = A, B, C, E, ..., D
df2.columns = A, B, C, F, ..., D
If I merge them, it merges on all columns. Also since the number of columns is high I don't want to specify them in on
. I prefer to exclude the columns which I don't want to be merged. How can I do that?
mdf = pd.merge(df1, df2, exclude D)
I expect the result be like:
mdf.columns = A, B, C, E, F ..., D_x, D_y
Upvotes: 1
Views: 3774
Reputation: 9658
One solution is using intersection
and then difference
on df1
and df2
columns:
mdf = pd.merge(df1, df2, on=df1.columns.intersection(df2.columns).difference(['D']).tolist())
The other solution could be renaming
columns you want to exclude from merge:
df2.rename(columns={"D":"D_y"}, inplace=True)
mdf = pd.merge(df1, df2)
Upvotes: 0
Reputation: 260335
What about dropping the unwanted column after the merge?
You can use pandas.DataFrame.drop
:
mdf = pd.merge(df1, df2).drop('D', axis=1)
or dropping before the merge:
mdf = pd.merge(df1.drop('D', axis=1), df2.drop('D', axis=1))
Upvotes: 1
Reputation: 71560
You mentioned you mentioned you don't want to use on
"since the number of columns is much".
You could still use on
this way even if there are a lot of columns:
mdf = pd.merge(df1, df2, on=[i for i in df1.columns if i != 'D'])
Or
By using pd.Index.difference
mdf = pd.merge(df1, df2, on=df1.columns.difference(['D']).tolist())
Upvotes: 3
Reputation: 1858
Another solution can be:
mdf = pd.merge(df1, df2, on= df1.columns.tolist().remove('D')
Upvotes: 3