Reputation: 73
How to retrieve nothing out of a spark dataframe.
I need something like this,
df.where("1" === "2")
I needed this so that I can do a left join with another dataframe. Basically I am trying to avoid the data skewing while joining two dataframes by splitting the null and not null key columns and joining them separately and then do a union them.
df1 has 300M records out of which 200M records has Null keys. df2 has another 300M records.
So to join them, I am splitting the df1 containing null and not null keys separately and then join them with df2. so to join the null key dataframe with df2, I don't need any records from df2.
I can just add the columns from df2 to null key df1, but curious to see if we have something like this in spark
df.where("1" === "2")
As we do in RDBMS SQLs.
Upvotes: 0
Views: 886
Reputation: 41957
where
function calls filter
function at the internal level so you can use filter
as
import org.apache.spark.sql.functions._
df.filter(lit(1) === lit(2))
or
import org.apache.spark.sql.functions._
df.filter(expr("1 = 2"))
or
df.filter("1 = 2")
or
df.filter("false")
or
import org.apache.spark.sql.functions._
df.filter(lit(false))
Any expression that would return false in the filter
function would work.
Upvotes: 1
Reputation:
There many different ways, like limit
:
df.limit(0)
where with Column
:
import org.apache.spark.sql.functions._
df.where(lit(false))
where with String
expression:
df.where("false")
1 = 2
expressed as
df.where("1 = 2")
or
df.where(lit(1) === lit(2))
would work as well, but are more verbose than required.
Upvotes: 4