Jarad
Jarad

Reputation: 18953

Python Pandas drop columns not found in both dataframes

Problem Setup:

import pandas as pd
df1 = pd.DataFrame({'D': {0: 'D0', 1: 'D1', 2: 'D2', 3: 'D3'}, 'B': {0: 'B0', 1: 'B1', 2: 'B2', 3: 'B3'}, 'A': {0: 'A0', 1: 'A1', 2: 'A2', 3: 'A3'}, 'C': {0: 'C0', 1: 'C1', 2: 'C2', 3: 'C3'}})
df2 = pd.DataFrame({'E': {4: 'B4', 5: 'B5', 6: 'B6', 7: 'B7'}, 'D': {4: 'C4', 5: 'C5', 6: 'C6', 7: 'C7'}, 'F': {4: 'D4', 5: 'D5', 6: 'D6', 7: 'D7'}, 'C': {4: 'A4', 5: 'A5', 6: 'A6', 7: 'A7'}})

df1

    A   B   C   D
0  A0  B0  C0  D0
1  A1  B1  C1  D1
2  A2  B2  C2  D2
3  A3  B3  C3  D3

df2

    C   D   E   F
4  A4  C4  B4  D4
5  A5  C5  B5  D5
6  A6  C6  B6  D6
7  A7  C7  B7  D7

I'm trying to determine the best way to return a dataframe that contains ONLY columns that appear in 2 OTHER dataframes.

I've come up with a solution but it's problematic because it will drop any column that has NaN.

df3 = pd.concat([df1, df2])

     A    B   C   D    E    F
0   A0   B0  C0  D0  NaN  NaN
1   A1   B1  C1  D1  NaN  NaN
2   A2   B2  C2  D2  NaN  NaN
3   A3   B3  C3  D3  NaN  NaN
4  NaN  NaN  A4  C4   B4   D4
5  NaN  NaN  A5  C5   B5   D5
6  NaN  NaN  A6  C6   B6   D6
7  NaN  NaN  A7  C7   B7   D7

df3 = df3.dropna(axis=1)

#This is the correct result I'm going for
    C   D
0  C0  D0
1  C1  D1
2  C2  D2
3  C3  D3
4  A4  C4
5  A5  C5
6  A6  C6
7  A7  C7

Or as a one-liner:

df3 = pd.concat([df1, df2]).dropna(axis=1)

I have a feeling there's an easier way to return a dataframe containing only the columns that are found in both two different dataframes (columns intersection). More elegant approach?

Upvotes: 0

Views: 1515

Answers (3)

chishaku
chishaku

Reputation: 4643

df3 = pd.concat([df1, df2], join='inner')

Upvotes: 1

Notre1
Notre1

Reputation: 101

Try this:

common_columns = list(set(df1.columns) & set(df2.columns))
pd.concat([df1, df2])[common_columns]

Result:

    C   D
0  C0  D0
1  C1  D1
2  C2  D2
3  C3  D3
4  A4  C4
5  A5  C5
6  A6  C6
7  A7  C7

Upvotes: 0

EdChum
EdChum

Reputation: 394389

You can use the intersection of the 2 df columns:

In [177]:

cols = df1.columns.intersection(df2.columns)
cols
Out[177]:
Index(['C', 'D'], dtype='object')
In [178]:

pd.concat([df1[cols],df2[cols]])
Out[178]:
    C   D
0  C0  D0
1  C1  D1
2  C2  D2
3  C3  D3
4  A4  C4
5  A5  C5
6  A6  C6
7  A7  C7

Upvotes: 2

Related Questions