Sc0rpion
Sc0rpion

Reputation: 73

Spark Data frame select nothing

How to retrieve nothing out of a spark dataframe.

I need something like this,

df.where("1" === "2")

I needed this so that I can do a left join with another dataframe. Basically I am trying to avoid the data skewing while joining two dataframes by splitting the null and not null key columns and joining them separately and then do a union them.

df1 has 300M records out of which 200M records has Null keys. df2 has another 300M records.

So to join them, I am splitting the df1 containing null and not null keys separately and then join them with df2. so to join the null key dataframe with df2, I don't need any records from df2.

I can just add the columns from df2 to null key df1, but curious to see if we have something like this in spark

df.where("1" === "2")

As we do in RDBMS SQLs.

Upvotes: 0

Views: 886

Answers (2)

Ramesh Maharjan
Ramesh Maharjan

Reputation: 41957

where function calls filter function at the internal level so you can use filter as

import org.apache.spark.sql.functions._
df.filter(lit(1) === lit(2))

or

import org.apache.spark.sql.functions._
df.filter(expr("1 = 2"))

or

df.filter("1 = 2")

or

df.filter("false")

or

import org.apache.spark.sql.functions._
df.filter(lit(false))

Any expression that would return false in the filter function would work.

Upvotes: 1

user9873150
user9873150

Reputation:

There many different ways, like limit:

df.limit(0)

where with Column:

import org.apache.spark.sql.functions._

df.where(lit(false))

where with String expression:

df.where("false")

1 = 2 expressed as

df.where("1 = 2")

or

df.where(lit(1) === lit(2))

would work as well, but are more verbose than required.

Upvotes: 4

Related Questions