Harshit
Harshit

Reputation: 1217

ReduceByKey on multidimensional tuple using scala and spark

Trying to reduceByKey on multidimensional list using scala , such that tuples are appended to parent tuple and thus generate multidimensional tuple.

In python I am trying to append to multidimensional list as follows and it works perfectly :

.map(lambda z:(z[1][0][1],[[z[1][0][1],str(z[1][0][2]),str(z[1][0][3]),z[1][0][0].strftime('%Y-%m-%dT%H:%M:%SZ'),z[1][1]]])).reduceByKey(lambda a,b:a+b)

But in scala I am unable to use reduceByKey , I am trying following :

.map(t => (t._2._1._2,((t._2._1._2,t._2._1._3,t._2._1._4,t._2._1._1,t._2._2)))).reduceByKey(t,y => t++y)

Any hints in right direction are also welcome!

Upvotes: 0

Views: 306

Answers (1)

zero323
zero323

Reputation: 330413

Scala Tuple*, unlike Python tuple, is not a collection. It is a Product. Technically it represents n-fold Cartesian product of, possible heterogeneous, sets of values. Scala Tuples cannot be concatenated and cannot contain more than 22 elements.

If you want to collect values per key you should either use some type of collection or even bettergroupByKey.

See also: How should I think about Scala's Product classes?

Upvotes: 2

Related Questions