Reputation: 1547
I want to find the common
rows and columns in all three dataframes: mrna
, meth
, and cna
. I concatenated the three tables by the Hugo_Symbol
. I'm expecting 343 common columns but am getting 347 common columns.
import pandas as pd
from functools import reduce
dfs = [mrna, meth, cna]
common = pd.concat(dfs, join='inner', ignore_index=True)
common = common.drop_duplicates()
common = common.loc[:, ~common.columns.duplicated()]
len(set(common.columns))
Current output:
347
Expected output:
343
Upvotes: 0
Views: 114
Reputation: 862681
I think you need for common columns names use set.intersection
- ordering should be changed (because in sets is not defined order):
out = set.intersection(*map(set,[x.columns for x in dfs]))
Or if ordering is important use reduce
with numpy.intersect1d
:
out = reduce(np.intersect1d, [x.columns for x in dfs])
In your solution common columns should be:
common = pd.concat(dfs, join='inner')
out = common.columns.tolist()
out1 = common.index.tolist()
Upvotes: 1