gstvolvr
gstvolvr

Reputation: 650

Sorting output of groupBy in Scalding

I am trying to sort the output of a groupBy statement using Scalding.

My dataset looks like this

Src           Eqid      Version  Datetime                                 Lat      Lon        Magnitude  Depth  NST  Region
ci            15214001  0        Tuesday, September 11, 2012 12:31:37 UTC  33.0110  -115.5330  1.3        2.20   18   Southern California
ci            15213993  0        Tuesday, September 11, 2012 12:23:34 UTC  35.3713  -118.5395  2.6        2.40   55   Central California

This is what I have been trying

.sourceFromArg(args, "input").read
 .groupBy('Region) { _.average('Magnitude -> 'avgMag) }
 .project('Region, 'avgMag)
 .write(sourceFromArg(args, "output"))

I know that I can do

.sortBy(field)

within the groupBy, but I cannot sort based on my desired field (i.e. avgMag).

Any ideas on how I can sort based on average magnitude?

Upvotes: 1

Views: 842

Answers (1)

gstvolvr
gstvolvr

Reputation: 650

This approach works but requires a second groupBy

  .groupBy('Region) { _.average('Magnitude -> 'avgMag) }
  .insert('dummy, 1)
  .groupBy('dummy) { _.sortBy('avgMag).reverse }
  .project('Region, 'avgMag)
  .write(sourceFromArg(args, "output"))

Upvotes: 1

Related Questions