Ahmad
Ahmad

Reputation: 9658

pandas merge by excluding certain columns from merge

I want to merge two dataframes like:

df1.columns = A, B, C, E, ..., D

df2.columns = A, B, C, F, ..., D

If I merge them, it merges on all columns. Also since the number of columns is high I don't want to specify them in on. I prefer to exclude the columns which I don't want to be merged. How can I do that?

mdf = pd.merge(df1, df2, exclude D)

I expect the result be like:

mdf.columns = A, B, C, E, F ..., D_x, D_y

Upvotes: 1

Views: 3774

Answers (4)

Ahmad
Ahmad

Reputation: 9658

One solution is using intersection and then difference on df1 and df2 columns:

mdf = pd.merge(df1, df2, on=df1.columns.intersection(df2.columns).difference(['D']).tolist())

The other solution could be renaming columns you want to exclude from merge:

df2.rename(columns={"D":"D_y"}, inplace=True)
mdf = pd.merge(df1, df2)

Upvotes: 0

mozway
mozway

Reputation: 260335

What about dropping the unwanted column after the merge?

You can use pandas.DataFrame.drop:

mdf = pd.merge(df1, df2).drop('D', axis=1)

or dropping before the merge:

mdf = pd.merge(df1.drop('D', axis=1), df2.drop('D', axis=1))

Upvotes: 1

U13-Forward
U13-Forward

Reputation: 71560

You mentioned you mentioned you don't want to use on "since the number of columns is much".

You could still use on this way even if there are a lot of columns:

mdf = pd.merge(df1, df2, on=[i for i in df1.columns if i != 'D'])

Or

By using pd.Index.difference

mdf = pd.merge(df1, df2, on=df1.columns.difference(['D']).tolist())

Upvotes: 3

BlackMath
BlackMath

Reputation: 1858

Another solution can be:

mdf = pd.merge(df1, df2, on= df1.columns.tolist().remove('D')

Upvotes: 3

Related Questions