Reputation: 68
I'm new to pandas so likely overlooking something but I've been searching and haven't found anything helpful yet.
What I'm trying to do is this. I have 2 dataframes. df1
has only 1 column and an unknown number of rows. df2
has an unknown number of rows also and also an unknown number of columns for each index.
Example:
df1:
0 1117454
1 1147637
2 1148945
3 1149662
4 1151543
5 1151545
6 1236268
7 1236671
8 1236673
...
300 1366962
df2:
1 2 3 4 5 6 7
8302813476 1375294 1375297 1375313 1375318 1375325 1375330 1375331
8302813477 1317422 1363270 1363288 1363262 None None None
8302813478 1187269 1187276 1149662 1147843 1147639 1236650 1236656
So what I want is to check all df1
values against df2
column 1 - n and if there is a match with any value in df1
mark the index of df2
as True else it is False.
Upvotes: 2
Views: 1297
Reputation: 294536
As an interesting numpy alternative
l1 = df1.values.ravel()
l2 = df2.values.ravel()
pd.DataFrame(
np.equal.outer(l1, l2).any(0).reshape(df2.values.shape),
df2.index, df2.columns
)
or using set
, list
and comprehension
l1 = set(df1.values.ravel().tolist())
l2 = df2.values.ravel().tolist()
pd.DataFrame(
np.array([bool(l1.intersection([d])) for d in l2]).reshape(df2.values.shape),
df2.index, df2.columns
)
Upvotes: 3
Reputation: 863701
I think you can use isin
for testing matching of Series
created from df2
by stack
with Series
created from one column df1
by squeeze
. Last reshape by unstack
:
df3 = df2.stack().isin(df1.squeeze()).unstack()
print (df3)
1 2 3 4 5 6 7
8302813476 False False False False False False False
8302813477 False False False False False False False
8302813478 False False True False False False False
Then get find all values where at least one True
by any
:
a = df3.any(axis=1)
print (a)
8302813476 False
8302813477 False
8302813478 True
dtype: bool
And last boolean indexing
:
print (a[a].index)
Int64Index([8302813478], dtype='int64')
Another solution is instead squeeze
use df1['col'].unique()
, thank you Ted Petrou:
df3 = df2.stack().isin(df1['col'].unique()).unstack()
print (df3)
1 2 3 4 5 6 7
8302813476 False False False False False False False
8302813477 False False False False False False False
8302813478 False False True False False False False
I like squeeze
more, but same output is simple selecting column of df1
:
df3 = df2.stack().isin(df1['col']).unstack()
print (df3)
1 2 3 4 5 6 7
8302813476 False False False False False False False
8302813477 False False False False False False False
8302813478 False False True False False False False
Upvotes: 3