subhajit saha
subhajit saha

Reputation: 43

Not equal function not working in PySpark inner join

I have two dataset, I want to join and find out the How many data in the df1 don't match any of the data we have in the df2 in PySpark

I tried this code:

join = df1.join(df2, df1.studyid != df2.studyid, how='inner')

But this code is not working properly.

Please help me out to solve this problem. For more info ping me in chat.

Thanks

Upvotes: 1

Views: 1439

Answers (1)

vladsiv
vladsiv

Reputation: 2946

Use leftanti:

join = df1.join(df2, df1.studyid == df2.studyid, how='leftanti')

An anti join returns values from the left relation that has no match with the right. It is also referred to as a left anti join.

More information: https://spark.apache.org/docs/latest/sql-ref-syntax-qry-select-join.html

Upvotes: 4

Related Questions