Reputation: 355
I have 3 dataframes 'u', 'join5' and site.
Here is the schema of dataframe 'u'.
scala> println(u.printSchema)
root
|-- split_sk: integer (nullable = true)
|-- new_date: string (nullable = true)
Now creating join6 with joining 'join5' and 'site' dataframes. Here are my 2 questions -
What is 'u("split_sk")' here in the below query? Is this possible to use the column of dataframe 'u' to compare randomly when join with 'u' is not clearly given in query?
What (<=>) sign represents to in scala and in particular in below query?
val join6 = join5.join(site, u("split_sk") <=> site("split_key") &&($"new_date" >= $"effective_dt") && ($"new_date" <= $"expiry_dt"),"left")
Upvotes: 0
Views: 57
Reputation: 812
For question 1,
Yes, "split_sk" is the column in "u". This is similar to SQL, a.column1 = b.column2
. It is spark way of specifying the same as above.
To answer another question, Yes it is possible to specify some column of a dataframe that is not present in the query. Most likely the scenario is join5 dataframe is created on top of join5
For question 2,
<=>
is called NULL SAFE join.
Refer to this Spark SQL "<=>" operator
Upvotes: 1