cs_guy
cs_guy

Reputation: 373

.isin() with a column from a dataframe

How can I query a table using isin() with another dataframe? For example there is this dataframe, df1:

| id      | rank |
|---------|------|
| SE34SER | 1    |
| SEF3445 | 2    |
| 5W4G4F  | 3    |

I want to query a table where a column in the table isin(df1.id). I tried doing so like this:

t = (
    spark.table('mytable')
    .where(sf.col('id').isin(df1.id))
    .select('*')
).show()

However it errors:

AttributeError: 'NoneType' object has no attribute 'id'

Upvotes: 3

Views: 3679

Answers (1)

Mohana B C
Mohana B C

Reputation: 5487

Unfortunately, you can't pass another dataframe's column to isin() method. You can get all the values of that column in a list and pass list to isin() method but this is not a better approach.

You can do inner join between those 2 dataframes.

df2 = spark.table('mytable')
df2.join(df1.select('id'),df1.id == df2.id, 'inner')

Upvotes: 5

Related Questions