Joining pairs of key-value with pairs of key-map

Question

I am having this dataset:

(apple,1)
(banana,4)
(orange,3)
(grape,2)
(watermelon,2)

, and the other dataset is:

(apple,Map(Bob -> 1))
(banana,Map(Chris -> 1))
(orange,Map(John -> 1))
(grape,Map(Smith -> 1))
(watermelon,Map(Phil -> 1))

I aiming to combine both sets to get:

(apple,1,Map(Bob -> 1))
(banana,4,Map(Chris -> 1))
(orange,3,Map(John -> 1))
(grape,2,Map(Smith -> 1))
(watermelon,2,Map(Phil -> 1))

The code I have:

...  
val counts_firstDataset = words.map(word => 
(word.firstWord, 1)).reduceByKey{case (x, y) => x + y}

Second dataset:

...
val counts_secondDataset  = secondSet.map(x => (x._1,
x._2.toList.groupBy(identity).mapValues(_.size)))

I tried to use the join method val joined_data = counts_firstDataset.join(counts_secondDataset) but did not work because the join takes pair of [K,V]. How would I get around this issue?

Glennie Helles Sindholt · Accepted Answer

The easiest way is just to convert to DataFrames and then join:

import spark.implicits._
val counts_firstDataset = words
  .map(word => (word.firstWord, 1))
  .reduceByKey{case (x, y) => x + y}
  .toDF("type", "value")

val counts_secondDataset = secondSet
  .map(x => (x._1,x._2.toList.groupBy(identity).mapValues(_.size)))
  .toDF("type_2","map")

counts_firstDataset
  .join(counts_secondDataset, 'type === 'type_2)
  .drop('type_2)

Joining pairs of key-value with pairs of key-map

Answers (2)

Related Questions