Match columns pandas Dataframe

Question

I want to match two pandas Dataframes by the name of their columns.

import pandas as pd
df1 = pd.DataFrame([[0,2,1],[1,3,0],[0,4,0]], columns=['A', 'B', 'C'])


    A   B   C
0   0   2   1
1   1   3   0
2   0   4   0

df2 = pd.DataFrame([[0,0,1],[1,5,0],[0,7,0]], columns=['A', 'B', 'D'])

    A   B   D
0   0   0   1
1   1   5   0
2   0   7   0

If the names match, do nothing. (Keep the column of df2)

If a column is in Dataframe 1 but not in Dataframe 2, add the column in Dataframe 2 as a vector of zeros.

If a column is in Dataframe 2 but not in Dataframe 1, drop it.

The output should look like this:

    A   B   C
0   0   0   0
1   1   5   0
2   0   7   0

I know if I do:

df2 = df2[df1.columns]

I get:

KeyError: "['C'] not in index"

I could also add the vectors of zeros manually, but of course this is a toy example of a much longer dataset. Is there any smarter/pythonic way of doing this?

Henry Ecker · Accepted Answer

It appears that df2 columns should be the same as df1 columns after this operation, as columns that are in df1 and not df2 should be added, while columns only in df2 should be removed. We can simply reindex df2 to match df1 columns with a fill_value=0 (this is the safe equivalent to df2 = df2[df1.columns] when adding new columns with a fill value):

df2 = df2.reindex(columns=df1.columns, fill_value=0)

df2:

Match columns pandas Dataframe

Answers (1)

Related Questions