Reputation: 650
I am trying to sort the output of a groupBy statement using Scalding.
My dataset looks like this
Src Eqid Version Datetime Lat Lon Magnitude Depth NST Region
ci 15214001 0 Tuesday, September 11, 2012 12:31:37 UTC 33.0110 -115.5330 1.3 2.20 18 Southern California
ci 15213993 0 Tuesday, September 11, 2012 12:23:34 UTC 35.3713 -118.5395 2.6 2.40 55 Central California
This is what I have been trying
.sourceFromArg(args, "input").read
.groupBy('Region) { _.average('Magnitude -> 'avgMag) }
.project('Region, 'avgMag)
.write(sourceFromArg(args, "output"))
I know that I can do
.sortBy(field)
within the groupBy, but I cannot sort based on my desired field (i.e. avgMag).
Any ideas on how I can sort based on average magnitude?
Upvotes: 1
Views: 842
Reputation: 650
This approach works but requires a second groupBy
.groupBy('Region) { _.average('Magnitude -> 'avgMag) }
.insert('dummy, 1)
.groupBy('dummy) { _.sortBy('avgMag).reverse }
.project('Region, 'avgMag)
.write(sourceFromArg(args, "output"))
Upvotes: 1