Scott
Scott

Reputation: 137

how to spark left join two datasets (special case)

I have two datasets.

The first dataset ONE has fields: uuid, name, latitude, longitude

The second dataset TWO has fields: uuid, lat, long

I want to do a left join in spark between ONE and TWO so that I get the resulting dataset: uuid, name, lat, long (the same type as ONE).

This is because I know ONE.latitude and ONE.longitude will be NULL. I want to fill those fields with TWO.lat and TWO.long respectively.

When I do the left join, the result will be something like this: uuid, name, latitude, longitude, lat, long

but I want it to be uuid, name, lat, long

I tried this:

ONE.joinWith(TWO, ONE.uuid == TWO.uuid, "left").drop("ONE.latitude, ONE.longitude")

But I am getting it to be of type Row instead of type ONE. How can I make it to be type ONE?

Upvotes: 0

Views: 205

Answers (1)

vilalabinot
vilalabinot

Reputation: 1601

You can do:

ONE.join(TWO, Seq("uuid"), "left")
   .withColumn("lat", col("latitude"))
   .withColumn("long", col("longitude"))
   .drop("latitude", "longitude")

This way, you are overwriting lat values from latitude. If you only want to replace the null values from lat and keep what is not null, you can do something like:

.withColumn("lat", when(col("lat").isNull, col("latitude")).otherwise(col("lat"))

Good luck!

Upvotes: 1

Related Questions