Reputation: 1568
I am trying to find the intersect of three dataframes, however the pd.intersect1d
does not like to use three dataframes.
import numpy as np
import pandas as pd
df1 = pd.DataFrame(np.random.randint(0,10,size=(10, 4)), columns=list('ABCD'))
df2 = pd.DataFrame(np.random.randint(0,10,size=(10, 4)), columns=list('BCDE'))
df3 = pd.DataFrame(np.random.randint(0,10,size=(10, 4)), columns=list('CDEF'))
inclusive_list = np.intersect1d(df1.columns, df2.columns, df3.columns)
Error:
ValueError: The truth value of a Index is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
The inclusive_list
should only include column names C & D. Any help would be appreciated. Thank you.
Upvotes: 4
Views: 9876
Reputation: 323306
You can using concat
pd.concat([df1.head(1),df2.head(1),df3.head(1)],join='inner').columns
Out[81]: Index(['C', 'D'], dtype='object')
Upvotes: 2
Reputation: 51155
Why your current approach doesn't work:
intersect1d
does not take N
arrays, it only compares 2.
numpy.intersect1d(ar1, ar2, assume_unique=False, return_indices=False)
You can see from the definition that you are passing the third array as the assume_unique
parameter, and since you are treating an array like a single boolean, you receive a ValueError
.
You can extend the functionality of intersect1d
to work on N
arrays using functools.reduce
:
from functools import reduce
reduce(np.intersect1d, (df1.columns, df2.columns, df3.columns))
array(['C', 'D'], dtype=object)
A better approach
However, the easiest approach is to just use intersection on the Index
object:
df1.columns & df2.columns & df3.columns
Index(['C', 'D'], dtype='object')
Upvotes: 6
Reputation: 942
inclusive_list = np.intersect1d(np.intersect1d(df1.columns, df2.columns), df3.columns)
Note that the arguments passed to np.intersect1d (https://docs.scipy.org/doc/numpy-1.15.0/reference/generated/numpy.intersect1d.html) are expected to be two arrays (ar1 and ar2).
Passing 3 arrays means that the assume_unique variable within the function is being set as an array (expected to be a bool).
You can also use simple native python set methods if you don't want to use numpy
inclusive_list = set(df1.columns).intersection(set(df2.columns)).intersection(set(df3.columns))
Upvotes: 0