How to append elements to a list by using reduceByKey in pyspark

Question

I'm kind of stuck trying to solve a problem in pyspark. After do same calculations by using map function, I have a RDD that contains a list of dicts in this way:

[{key1: tuple1}, {key1: tuple2}....{key2: tuple1}, {keyN: tupleN}]

I pretend to append for each key a list with all the tuples with the same key, obtaining something like:

[{key1: [tuple1, tuple2, tuple3...]}, {key2: [tuple1, tuple2....]}]

I think an example it's more illustrative:

[{0: (0, 1.0)}, {0: (1, 0.0)}, {1: (0, 0.0)}, {1: (1, 1.0)}, {2:(0,0.0)}... ]

And I would like to obtain list of dicts like this:

[{0: [(0, 1.0), (1, 0.0)}, {1: [(0, 0.0), (1, 1.0)]}, {2:[(0,0.0),...]},...]

I'm trying to avoid using "combineByKey" function because it lasts too much time, there is any possibility to do that with "reduceByKey"??

Thanks you all very much.

How to append elements to a list by using reduceByKey in pyspark

Answers (1)

Related Questions