DollarChills
DollarChills

Reputation: 1086

Group by key in array and get max and average

I have an array that is structure as such:

{"status": "ok", "data": [{"temp": 22, "wind": 351.0, "datetime": "20160815-0330"}, {"temp": 21, "wind": 321.0, "datetime": "20160815-0345"}]}

I'm looking to group by the datetime key (ignoring the time), find the max temp and the average wind.

I've tried something as follows, but unsure of how to do max_by and average in the same map:

@data['data'].group_by { |d| d.values_at("datetime") }.map { |_, v| v.max_by { |h| h["temp"] } }

Upvotes: 1

Views: 535

Answers (2)

Mohanraj
Mohanraj

Reputation: 4200

The above method work very well, the code seems bit complicated by making use of max_by and the access the value [:temp] and then sum and explicit to_h. So, if you consider for performance and good readability wise you could use the basic each like below,

data = {"20160815"=>[{:temp=>22, :wind=>351.0, :datetime=>"20160815-0330"}, {:temp=>21, :wind=>321.0, :datetime=>"20160815-0345"}]}
data.map do |k, v|                                                                                                                                                                     
  winds = []                                                                                                                                                                           
  temps = []                                                                                                                                                                           
  v.each do |item|                                                                                                                                                                       
    winds << item[:wind]                                                                                                                                                                 
    temps << item[:temp]                                                                                                                                                               
  end                                                                                                                                                                                  
  {k => {max_temp: temps.max, avg_wind: winds.inject(:+).to_f/winds.length}}                                                                                                         
end

And the output is below,

# => {"20160815"=>{:max_temp=>22, :avg_wind=>336.0}}

Below is the small benchmark between making use of each and max_by,

data = {"20160815"=>[{:temp=>22, :wind=>351.0, :datetime=>"20160815-0330"}, {:temp=>21, :wind=>321.0, :datetime=>"20160815-0345"}]}


def by_each(data)
  data.map do |k, v|
    winds = []
    temps = []
    v.each do |item|
      winds << item[:wind]
      temps << item[:temp]
    end
    {k => {max_temp: temps.max, avg_wind: winds.inject(:+).to_f/winds.length}}
  end
end

def by_max(data)
  data.map do |date, values|
    [date, {
       maximum_temp: values.max_by { |value| value[:temp] }[:temp],
       average_wind: values.sum { |value| value[:wind] }.to_f / values.length
     }]
  end.to_h
end

Benchmark.ips do |x|                                                                                                                                                                   
  x.config(times: 10)                                                                                                                                                                  
  x.report 'BY_EACH' do                                                                                                                                                                  
    by_each(data)                                                                                                                                                                      
  end
  x.report 'BY_MAX' do                                                                                                                                                                   
    by_max(data)                                                                                                                                                                       
  end                                                                                                                                                                                  
  x.compare!                                                                                                                                                                         
end

And the benchmark o/p is like below,

Warming up --------------------------------------
             BY_EACH    18.894k i/100ms
              BY_MAX    13.793k i/100ms
Calculating -------------------------------------
             BY_EACH    226.160k (± 5.3%) i/s -      1.134M in   5.025488s
              BY_MAX    154.745k (± 5.8%) i/s -    772.408k in   5.006365s

Comparison:
             BY_EACH:   226159.5 i/s
              BY_MAX:   154744.8 i/s - 1.46x  slower

Hence, you can see BY_MAX is 1.46 times slower than BY_EACH. But, of course you can make use of any approach that suits for your understanding and usability.

Upvotes: 0

Simple Lime
Simple Lime

Reputation: 11035

So, when you do "data": { ... }, the data actually becomes a symbol, not a string so you would need to do something like:

@data[:data].group_by { |data| data[:datetime].split('-')[0] }

in order to group by the :datetime key, ignoring the time portion (I assume, the time portion is just everything after the -). Then you end up with a hash looking like:

{"20160815"=>[{:temp=>22, :wind=>351.0, :datetime=>"20160815-0330"}, {:temp=>21, :wind=>321.0, :datetime=>"20160815-0345"}]}

and to find the max :temp and average of the :wind you can do:

results = @data[:data].group_by { |data| data[:datetime].split('-')[0] }.map do |date, values|
  [date, {
    maximum_temp: values.max_by { |value| value[:temp] }[:temp],
    average_wind: values.sum { |value| value[:wind] }.to_f / values.length
  }]
end.to_h
# => {"20160815"=>{:maximum_temp=>22, :average_wind=>336.0}}

Upvotes: 1

Related Questions