Reputation: 35928
I use Spark sql to load data into a val
like this
val customers = sqlContext.sql("SELECT * FROM customers")
But I have a separate txt file that contains one column CUST_ID
and 50,00 rows. i.e.
CUST_ID
1
2
3
I want my customers
val to have all customers in customers
table that are not in the TXT file.
Using Sql I would do this by SELECT * FROM customers NOT IN cust_id ('1','2','3')
How can I do this using Spark?
I've read the textFile and I can print rows of it but I'm not sure how to match this with my sql query
scala> val custids = sc.textFile("cust_ids.txt")
scala> custids.take(4).foreach(println)
CUST_ID
1
2
3
Upvotes: 3
Views: 88
Reputation: 3587
You can import your text file as a dataframe and do a left outer join:
val customers = Seq(("1", "AAA", "shipped"), ("2", "ADA", "delivered") , ("3", "FGA", "never received")).toDF("id","name","status")
val custId = Seq(1,2).toDF("custId")
customers.join(custId,'id === 'custId,"leftOuter")
.where('custId.isNull)
.drop("custId")
.show()
+---+----+--------------+
| id|name| status|
+---+----+--------------+
| 3| FGA|never received|
+---+----+--------------+
Upvotes: 3