Reputation: 115
I have a function in scala which I send arguments to, I use it like this:
val evega = concat.map(_.split(",")).keyBy(_(0)).groupByKey().map{case (k, v) => (k, f(v))}
My function f is:
val formatter = DateTimeFormatter.ofPattern("yyyy-MM-dd")
implicit val localDateOrdering: Ordering[LocalDate] = Ordering.by(_.toEpochDay)
def f(v: Array[String]): Int = {
val parsedDates = v.map(LocalDate.parse(_, formatter))
parsedDates.max.getDayOfYear - parsedDates.min.getDayOfYear}
And this is the error I get:
found : Iterable[Array[String]]
required: Array[String]
I already tried using:
val evega = concat.map(_.split(",")).keyBy(_(0)).groupByKey().map{case (k, v) => (k, for (date <- v) f(date))}
But I get massive errors.
Just to get a better picture, data in concat is:
1974,1974-06-22
1966,1966-07-20
1954,1954-06-19
1994,1994-06-27
1954,1954-06-26
2006,2006-07-04
2010,2010-07-07
1990,1990-06-30
...
It is type RDD[String]. How can I properly iterate over that and get a single Int from that function f?
Upvotes: 2
Views: 245
Reputation: 61666
The RDD types alongside your pipeline are:
concat.map(_.split(","))
gives an RDD[Array[String]]
Array("1954", "1954-06-19")
concat.map(_.split(",")).keyBy(_(0))
gives RDD[(String, Array[String])]
("1954", Array("1954", "1954-06-19"))
concat.map(_.split(",")).keyBy(_(0)).groupByKey()
gives RDD[(String, Iterable[Array[String]])]
Iterable(("1954", Iterable(Array("1954", "1954-06-19"), Array("1954", "1954-06-24"))))
Thus when you map
at the end, the type of values is Iterable[Array[String]]
.
Since your input is "1974,1974-06-22"
, the solution could consist in replacing your keyBy
transformation by a map
:
input.map(_.split(",")).map(x => x(0) -> x(1)).groupByKey().map{case (k, v) => (k, f(v))}
Indeed, .map(x => x(0) -> x(1))
(instead of .map(x => x(0) -> x)
whose keyBy(_(0))
is syntactic sugar for) will provide for the value the second element of the split array instead of the array itself. Thus giving RDD[(String, String)]
during this second step rather than RDD[(String, Array[String])]
.
Upvotes: 2