Anon
Anon

Reputation: 1547

How do I find common rows and columns in multiple (more than 2) dataframes using pandas?

I want to find the common rows and columns in all three dataframes: mrna, meth, and cna. I concatenated the three tables by the Hugo_Symbol. I'm expecting 343 common columns but am getting 347 common columns.

import pandas as pd

from functools import reduce

dfs = [mrna, meth, cna]
common = pd.concat(dfs, join='inner', ignore_index=True)
common = common.drop_duplicates()
common = common.loc[:, ~common.columns.duplicated()]
len(set(common.columns))

Current output:

347

Expected output:

343

Upvotes: 0

Views: 114

Answers (1)

jezrael
jezrael

Reputation: 862681

I think you need for common columns names use set.intersection - ordering should be changed (because in sets is not defined order):

out = set.intersection(*map(set,[x.columns for x in dfs]))

Or if ordering is important use reduce with numpy.intersect1d:

out = reduce(np.intersect1d, [x.columns for x in dfs])

In your solution common columns should be:

common = pd.concat(dfs, join='inner')
out = common.columns.tolist()
out1 = common.index.tolist()

Upvotes: 1

Related Questions