Reputation: 18601
I have a TypedTipe[(String, String, Long)]
where the first String can assume only a limited (~10) number of values. I'd like to partition my output so that a folder is created for each type (I.E. 10 folders with the name of the first String). This is simple to achieve in Hive, however I cannot find an elegant way to do it in Scalding. The method def partition(p: T => Boolean): (TypedPipe[T], TypedPipe[T])
breaks the pipe in 2 parts but does not do what I'm looking for.
EDIT
v0.13.1
PackedAvroSource
Upvotes: 0
Views: 256
Reputation: 1431
If you group by the field you want to partition by, you can then use PartitionedDelimitedSource to write the directory structure as needed. Ex:
val pipe: TypedPipe[(String, String, Long)] = ...
pipe
.groupBy(_._1)
.write(PartitionedDelimited[String, (String, String, Long)](args("output"), "%s"))
Upvotes: 1