Reputation: 137
I have two datasets.
The first dataset ONE has fields: uuid, name, latitude, longitude
The second dataset TWO has fields: uuid, lat, long
I want to do a left join in spark between ONE and TWO so that I get the resulting dataset: uuid, name, lat, long (the same type as ONE).
This is because I know ONE.latitude and ONE.longitude will be NULL. I want to fill those fields with TWO.lat and TWO.long respectively.
When I do the left join, the result will be something like this: uuid, name, latitude, longitude, lat, long
but I want it to be uuid, name, lat, long
I tried this:
ONE.joinWith(TWO, ONE.uuid == TWO.uuid, "left").drop("ONE.latitude, ONE.longitude")
But I am getting it to be of type Row instead of type ONE. How can I make it to be type ONE?
Upvotes: 0
Views: 205
Reputation: 1601
You can do:
ONE.join(TWO, Seq("uuid"), "left")
.withColumn("lat", col("latitude"))
.withColumn("long", col("longitude"))
.drop("latitude", "longitude")
This way, you are overwriting lat
values from latitude
. If you only want to replace the null values from lat
and keep what is not null, you can do something like:
.withColumn("lat", when(col("lat").isNull, col("latitude")).otherwise(col("lat"))
Good luck!
Upvotes: 1