Anthony
Anthony

Reputation: 35978

How to get records that exist in table 1 but not in table 2?

I have two tables: excluded and kaggleresults. I am trying to find records that exist in excluded but don't exist in kaggleresults

counts:

scala> spark.sql("select * from excluded").count()
res136: Long = 4652

scala> spark.sql("select * from kaggleresults").count()
res137: Long = 4635

The difference is 17

scala> res136-res137
res139: Long = 17

I am trying to get those 17 records. I wrote the query below but it returns 38 instead.

scala> spark.sql("select * from excluded left join kaggleresults on kaggleresults.subject_id = excluded.subject_id where kaggleresults.subject_id is null").count()
res135: Long = 38

Question

What query do I need to write to get those 17 records?

Upvotes: 3

Views: 10346

Answers (4)

Jacek Laskowski
Jacek Laskowski

Reputation: 74679

Isn't it a case for LEFT_ANTI join?

scala> val excluded = (0 to 5).toDS
left: org.apache.spark.sql.Dataset[Int] = [value: int]

scala> val kaggleresults = (3 to 10).toDS
right: org.apache.spark.sql.Dataset[Int] = [value: int]

scala> excluded.join(kaggleresults, Seq("value"), "leftanti").show
+-----+
|value|
+-----+
|    0|
|    1|
|    2|
+-----+

Upvotes: 9

Eric
Eric

Reputation: 46

For the actual records (not the counts), you can use

SELECT * FROM excluded
WHERE subject_id NOT IN ( SELECT subject_id FROM kaggleresults )

However, you should not be surprised if the resultant row count does not match the difference of the two tables' individual row counts.

For example: suppose table1 has id's 1, 2, 3, 4, and 5, and table2 has id's 3, 4, 5, and 6. The row counts of table1 and of table2 are 5 and 4, respectively, for a difference of 1, but there are actually two records in table1 that have id's that do not exist in table2.

Upvotes: 1

Anurag_Soni
Anurag_Soni

Reputation: 542

using not in key word You can use the below query

Select * from excluded where subject_id not in (select subject_id from  kaggleresults)

Upvotes: 0

Mike
Mike

Reputation: 143

Use the Not In syntax along with a subquery.

SELECT ID, Name 
FROM   Table1 
WHERE  ID NOT IN (SELECT ID FROM Table2)

SQL query to find record with ID not in another table

Upvotes: 0

Related Questions