koh-ding
koh-ding

Reputation: 135

How would I merge these two dataframes to produce the third dataframe in Spark Scala?

I'm having difficulty joining these two dataframe views because of not being able to modify specific column values in spark scala. I think I have to do a transpose/join somehow, but am not able to figure it out.

Here is the first dataframe:

  var sample_df = Seq(("john","morning","7am"),("john","night","10pm"),("bob","morning","8am"),("bob","night","11pm"),("phil","morning","9am"),("phil","night","10pm")).toDF("person","time_of_day","wake/sleep hour")

enter image description here

here is the second dataframe:

  var sample_df2 = Seq(("john","6am","11pm"),("bob","7am","2am"),("phil","8am","1am")).toDF("person","morning_earliest","night_latest")

enter image description here

and here is the resulting dataframe I'm looking to produce:

  var resulting_df = Seq(("john","morning","7am","6am"),("john","night","10pm","11pm"),("bob","morning","8am","7am"),("bob","night","11pm","2am"),("phil","morning","9am","8am"),("phil","night","10pm","1am")).toDF("person","time_of_day","wake/sleep hour","earliest/latest")

enter image description here

Any help would be greatly appreciated! Thanks and have a great day!

Upvotes: 0

Views: 141

Answers (2)

talhasagdan
talhasagdan

Reputation: 21

val df = sample_df
    .join(sample_df2,"person")

val resulting_df = df.withColumn("earliest/latest",
    when(col("time_of_day")=== "morning", $"morning_earliest")
    .otherwise($"night_latest"))
    .drop($"morning_earliest")
    .drop($"night_latest")

resulting_df.show()

Upvotes: 0

mck
mck

Reputation: 42422

sample_df.createOrReplaceTempView("df1")
sample_df2.createOrReplaceTempView("df2")

spark.sql("""
select person, time_of_day, `wake/sleep hour`, `earliest/latest`
from (
    select person, stack(2, 'morning', morning_earliest, 'night', night_latest) as (time_of_day, `earliest/latest`)
    from df2
) df
join df1
using (time_of_day, person)
""").show()

+------+-----------+---------------+---------------+
|person|time_of_day|wake/sleep hour|earliest/latest|
+------+-----------+---------------+---------------+
|  john|    morning|            7am|            6am|
|  john|      night|           10pm|           11pm|
|   bob|    morning|            8am|            7am|
|   bob|      night|           11pm|            2am|
|  phil|    morning|            9am|            8am|
|  phil|      night|           10pm|            1am|
+------+-----------+---------------+---------------+

Upvotes: 1

Related Questions