Marsellus Wallace
Marsellus Wallace

Reputation: 18601

Vertical Partitioning in Scalding

I have a TypedTipe[(String, String, Long)] where the first String can assume only a limited (~10) number of values. I'd like to partition my output so that a folder is created for each type (I.E. 10 folders with the name of the first String). This is simple to achieve in Hive, however I cannot find an elegant way to do it in Scalding. The method def partition(p: T => Boolean): (TypedPipe[T], TypedPipe[T]) breaks the pipe in 2 parts but does not do what I'm looking for.

EDIT

Upvotes: 0

Views: 256

Answers (1)

Dan Osipov
Dan Osipov

Reputation: 1431

If you group by the field you want to partition by, you can then use PartitionedDelimitedSource to write the directory structure as needed. Ex:

val pipe: TypedPipe[(String, String, Long)] = ...
pipe
    .groupBy(_._1)
    .write(PartitionedDelimited[String, (String, String, Long)](args("output"), "%s"))

Upvotes: 1

Related Questions