I want to do the same transformation in Python as I did in Scala

Question

I'm new to Python.

Scala Code:

rdd1 is in string format

rdd1=sc.parallelize("[Canada,47;97;33;94;6]", "[Canada,59;98;24;83;3]","[Canada,77;63;93;86;62]")



val resultRDD = rdd1.map { r =>
  val Array(country, values) = r.replaceAll("$$|$$", "").split(",")
    country -> values
}.reduceByKey((a, b) => a.split(";").zip(b.split(";")).map {
  case (i1, i2) => i1.toInt + i2.toInt }.mkString(";"))

Output:

Country,Values  //I have puted the column name to make sure that the output should be in two column  
Canada,183;258;150;263;71

jxc · Accepted Answer

Edit: OP wants to use map instead of flatMap, so I adjusted flatMap to map by which, you just need to take the first item out of the list comprehension, thus map(lambda x: [...][0]).

side-note: The above change is valid only to this particular case when list comprehension returns a list with only one item. for more general cases, you might need two map()s to replace what flatMap() does.

One way with RDD is to use a list comprehension to strip, split and convert the String into a key-value pair, with Country as key and a tuple of numbers as value. Since we use list comprehension, so we take flatMap on the RDD element, then use reduceByKey to do the calculation and mapValues to convert the resulting tuple back into string:

rdd1.map(lambda x: [ (e[0], tuple(map(int, e[1].split(';')))) for e in [x.strip('][').split(',')] ][0]) \
    .reduceByKey(lambda x,y: tuple([ x[i]+y[i] for i in range(len(x))]) ) \
    .mapValues(lambda x: ';'.join(map(str,x))) \
    .collect()

output after flatMap:

[('Canada', (47, 97, 33, 94, 6)),
 ('Canada', (59, 98, 24, 83, 3)),
 ('Canada', (77, 63, 93, 86, 62))]

output after reduceByKey:

[('Canada', (183, 258, 150, 263, 71))]

output after mapValues:

[('Canada', '183;258;150;263;71')]

I want to do the same transformation in Python as I did in Scala

Answers (2)

Related Questions