user2044234
user2044234

Reputation: 23

Perform a Correlated Scalar SubQuery in Spark Dataframe Java API (spark v2.3.0)

I have read that in spark you can easily do a correlated scalar subquery like so:

select
    column1,
    (select column2 from table2 where table2.some_key = table1.id)
from table1

What I have not figured out is how to do this in the DataFrame API. The best I can come up with is to do a join. The problem with this is that in my specific case I am joining with a enum-like lookup table that actually applies to more than one column.

Below is an example of the DataFrame code.

Dataset<Row> table1 = getTable1FromSomewhere();
Dataset<Row> table2 = getTable2FromSomewhere();

table1
    .as("table1")
    .join(table2.as("table2"), 
            col("table1.first_key").equalTo(col("table2.key")), "left")
    .join(table2.as("table3"), 
            col("table1.second_key").equalTo(col("table3.key")), "left")
    .select(col("table1.*"), 
            col("table2.description").as("first_key_description"),
            col("table3.description").as("second_key_description"))
    .show();

Any help would be greatly appreciated on figuring out how to do this in the DataFrame API.

Upvotes: 2

Views: 1578

Answers (1)

user9969978
user9969978

Reputation: 26

What I have not figured out is how to do this in the DataFrame API.

Because there is simply no DataFrame API that can express that directly (without explicit JOIN). It can possibly change in the future:

Upvotes: 1

Related Questions