Jenny Blunt
Jenny Blunt

Reputation: 1596

Calculating avg. in deeply nested hash and then group by another field

I'm trying to work out the most efficient way to loop through some deeply nested data, find the average of the values and return a new hash with the data grouped by the date.

The raw data looks like this:

[
    client_id: 2,
    date: "2015-11-14",
    txbps: {
        "22"=>{
            "43"=>17870.153846153848,
            "44"=>15117.866666666667
        }
    },
    client_id: 1,
    date: "2015-11-14",
    txbps: {
        "22"=>{
            "43"=>38113.846153846156,
            "44"=>33032.0
        }
    },
    client_id: 4,
    date: "2015-11-14",
    txbps: {
        "22"=>{
            "43"=>299960.0,
            "44"=>334182.4
        }
    },
]

I have about 10,000,000 of these to loop through so I'm a little worried about performance.

The end result, needs to look like this. The vals need to be the average of the txbps:

[
    {
        date: "2015-11-14",
        avg: 178730.153846153848
    },
    {
        date: "2015-11-15",
        avg: 123987.192873978987
    },
    {
        date: "2015-11-16",
        avg: 126335.982123876283
    }
]

I've tried this to start:

results.map { |val| val["txbps"].values.map { |a| a.values.sum } }

But that's giving me this:

[[5211174.189281798, 25998.222222222223], [435932.442835184, 56051.555555555555], [5718452.806735582, 321299.55555555556]]

And I just can't figure out how to get it done. I can't find any good references online either.

I also tried to group by the date first:

res.map { |date, values| values.map { |client| client["txbps"].map { |tx,a| { date: date, client_id: client[':'], tx: (a.values.inject(:+) / a.size).to_i } } } }.flatten

[
    {
        : date=>"2015-11-14",
        : client_id=>"2",
        : tx=>306539
    },
    {
        : date=>"2015-11-14",
        : client_id=>"2",
        : tx=>25998
    },
    {
        : date=>"2015-11-14",
        : client_id=>"2",
        : tx=>25643
    },
    {
        : date=>"2015-11-14",
        : client_id=>"2",
        : tx=>56051
    },
    {
        : date=>"2015-11-14",
        : client_id=>"1",
        : tx=>336379
    },
    {
        : date=>"2015-11-14",
        : client_id=>"1",
        : tx=>321299
    }
]

If possible, how can I do this in a single run.

---- EDIT ----

Got a little bit further:

res.map { |a,b|
  {
    date: a[:date], val: a["txbps"].values.map { |k,v|
      k.values.sum / k.size
    }.first
  }
}.
group_by { |el| el[:date] }.map { |date,list|
  {
    key: date, val: list.map { |elem| elem[:val] }.reduce(:+) / list.size
  }
}

But that's epic - is there a faster, simpler way??

Upvotes: 0

Views: 629

Answers (2)

Cary Swoveland
Cary Swoveland

Reputation: 110755

Data Structure

I assume your input data is an array of hashes. For example:

arr = [
  {
    client_id: 2,
    date: "2015-11-14",
    txbps: {
      "22"=>{
        "43"=>17870.15,
        "44"=>15117.86
      }
    }
  },
  {
    client_id: 1,
    date: "2015-11-15",
    txbps: {
      "22"=>{
        "43"=>38113.84,
        "44"=>33032.03,
      }
    }
  },

  {
    client_id: 4,
    date: "2015-11-14",
    txbps: {
      "22"=>{
        "43"=>299960.0,
        "44"=>334182.4
      }
    }
  },
  {
    client_id: 3,
    date: "2015-11-15",
    txbps: {
      "22"=>{
        "43"=>17870.15,
        "44"=>15117.86
      }
    }
  }
]

Code

Based on my understanding of the problem, you can compute averages as follows:

def averages(arr)
  h = arr.each_with_object(Hash.new { |h,k| h[k] = [] }) { |g,h|
    g[:txbps].values.each { |f| h[g[:date]].concat(f.values) } }
  h.merge(h) { |_,v| (v.reduce(:+)/(v.size.to_f)).round(2) }
end

Example

For arr above:

avgs = averages(arr)
  #=> {"2015-11-14"=>166782.6, "2015-11-15"=>26033.47} 

The value of the hash h in the first line of the method was:

{"2015-11-14"=>[17870.15, 15117.86, 299960.0, 334182.4],
 "2015-11-15"=>[38113.84, 33032.03, 17870.15, 15117.86]} 

Convert hash returned by averages to desired array of hashes

avgs is not in the form of the output desired. It's a simple matter to do the conversion, but you might consider leaving the hash output in this format. The conversion is simply:

avgs.map { |d,avg| { date: d, avg: avg } }
 #=> [{:date=>"2015-11-14", :avg=>166782.6},
 #    {:date=>"2015-11-15", :avg=>26033.47}]

Explanation

Rather than explain in detail how the method works, I will instead give an alternative form of the method does exactly the same thing, but in a more verbose and slightly less Ruby-like way. I've also included the conversion of the hash to an array of hashes at the end:

def averages(arr)
  h = {}
  arr.each do |g|
    vals = g[:txbps].values      
    vals.each do |f|
      date = g[:date]
      h[date] = [] unless h.key?(date)
      h[date].concat(f.values)
    end
  end

  keys = h.keys
  keys.each do |k|
    val = h[k]
    h[k] = (val.reduce(:+)/(val.size.to_f)).round(2)
  end

  h.map { |d,avg| { date: d, avg: avg } }
end

Now let me insert some puts statements to print out various intermediate values in the calculations, to help explain what's going on:

def averages(arr)
  h = {}
  arr.each do |g|
    puts "g=#{g}"
    vals = g[:txbps].values      
    puts "vals=#{vals}"
    vals.each do |f|
      puts "  f=#{f}"
      date = g[:date]
      puts "  date=#{date}"
      h[date] = [] unless h.key?(date)
      puts "  before concat, h=#{h}"
      h[date].concat(f.values)
      puts "  after concat, h=#{h}"
    end
    puts
  end

  puts "h=#{h}"
  keys = h.keys
  puts "keys=#{keys}"

  keys.each do |k|
    val = h[k]
    puts "  k=#{k}, val=#{val}"
    puts "  val.reduce(:+)=#{val.reduce(:+)}"
    puts "  val.size.to_f=#{val.size.to_f}"
    h[k] = (val.reduce(:+)/(val.size.to_f)).round(2)
    puts "  h[#{k}]=#{h[k]}"
    puts
  end

  h.map { |d,avg| { date: d, avg: avg } }
end

Execute averages once more:

averages(arr)

g={:client_id=>2, :date=>"2015-11-14", :txbps=>{"22"=>{"43"=>17870.15, "44"=>15117.86}}}
vals=[{"43"=>17870.15, "44"=>15117.86}]
  f={"43"=>17870.15, "44"=>15117.86}
  date=2015-11-14
  before concat, h={"2015-11-14"=>[]}
  after concat, h={"2015-11-14"=>[17870.15, 15117.86]}

g={:client_id=>1, :date=>"2015-11-15", :txbps=>{"22"=>{"43"=>38113.84, "44"=>33032.03}}}
vals=[{"43"=>38113.84, "44"=>33032.03}]
  f={"43"=>38113.84, "44"=>33032.03}
  date=2015-11-15
  before concat, h={"2015-11-14"=>[17870.15, 15117.86], "2015-11-15"=>[]}
  after concat, h={"2015-11-14"=>[17870.15, 15117.86], "2015-11-15"=>[38113.84, 33032.03]}

g={:client_id=>4, :date=>"2015-11-14", :txbps=>{"22"=>{"43"=>299960.0, "44"=>334182.4}}}
vals=[{"43"=>299960.0, "44"=>334182.4}]
  f={"43"=>299960.0, "44"=>334182.4}
  date=2015-11-14
  before concat, h={"2015-11-14"=>[17870.15, 15117.86],
                    "2015-11-15"=>[38113.84, 33032.03]}
  after concat, h={"2015-11-14"=>[17870.15, 15117.86, 299960.0, 334182.4],
                   "2015-11-15"=>[38113.84, 33032.03]}

g={:client_id=>3, :date=>"2015-11-15", :txbps=>{"22"=>{"43"=>17870.15, "44"=>15117.86}}}
vals=[{"43"=>17870.15, "44"=>15117.86}]
  f={"43"=>17870.15, "44"=>15117.86}
  date=2015-11-15
  before concat, h={"2015-11-14"=>[17870.15, 15117.86, 299960.0, 334182.4],
                    "2015-11-15"=>[38113.84, 33032.03]}
  after concat, h={"2015-11-14"=>[17870.15, 15117.86, 299960.0, 334182.4],
                   "2015-11-15"=>[38113.84, 33032.03, 17870.15, 15117.86]}

h={"2015-11-14"=>[17870.15, 15117.86, 299960.0, 334182.4],
   "2015-11-15"=>[38113.84, 33032.03, 17870.15, 15117.86]}
keys=["2015-11-14", "2015-11-15"]
  k=2015-11-14, val=[17870.15, 15117.86, 299960.0, 334182.4]
  val.reduce(:+)=667130.41
  val.size.to_f=4.0
  h[2015-11-14]=166782.6

  k=2015-11-15, val=[38113.84, 33032.03, 17870.15, 15117.86]
  val.reduce(:+)=104133.87999999999
  val.size.to_f=4.0
  h[2015-11-15]=26033.47

  #=> [{:date=>"2015-11-14", :avg=>166782.6},
  #    {:date=>"2015-11-15", :avg=>26033.47}]

Upvotes: 1

Ben Aubin
Ben Aubin

Reputation: 5667

Try #inject

Like .map, It's a way of converting a enumerable (list, hash, pretty much anything you can loop in Ruby) into a different object. Compared to .map, it's a lot more flexible, which is super helpful. Sadly, this comes with a cost of the method being super hard to wrap your head around. I think Drew Olson explains it best in his answer.

You can think of the first block argument as an accumulator: the result of each run of the block is stored in the accumulator and then passed to the next execution of the block. In the case of the code shown above, you are defaulting the accumulator, result, to 0. Each run of the block adds the given number to the current total and then stores the result back into the accumulator. The next block call has this new value, adds to it, stores it again, and repeats.

Examples:

To sum all the numbers in an array (with #inject), you can do this:

array = [5,10,7,8]
#            |- Initial Value   
array.inject(0) { |sum, n| sum + n } #=> 30
#                   |- You return the new value for the accumulator in this block.

To find the average of an array of numbers, you can find a sum, and then divide. If you divide the num variable inside the inject function ({|sum, num| sum + (num / array.size)}), you multiply the amount of calculations you will have to do.

array = [5,10,7,8]
array.inject(0.0) { |sum, num| sum + num } / array.size #=> 7.5

Method

If creating methods on classes is your style, you can define a method on the Array class (from John Feminella's answer). Put this code somewhere before you need to find the sum or mean of an array:

class Array
  def sum
    inject(0.0) { |result, el| result + el }
  end

  def mean 
    sum / size
  end
end

And then

array = [5,10,7,8].sum #=> 30
array = [5,10,7,8].mean #=> 7.5

Gem

If you like putting code in black boxes, or really precious minerals, then you can use the average gem by fegoa89: gem install average. It also has support for the #mode and #median

[5,10,7,8].mean #=> 7.5

Solution:

Assuming your objects look like this:

data = [
    {
        date: "2015-11-14",
        ...
        txbps: {...},
    },
    {
        date: "2015-11-14",
        ...
        txbps: {...},
    },
    ...
]

This code does what you need, but it's somewhat complex.

class Array
  def sum
    inject(0.0) { |result, el| result + el }
  end

  def mean 
    sum / size
  end
end

data = (data.inject({}) do |hash, item|
    this = (item[:txbps].values.map {|i| i.values}).flatten # Get values of values of `txbps`
    hash[item[:date]] = (hash[item[:date]] || []) + this # If a list already exists for this date, use it, otherwise create a new list, and add the info we created above.
    hash # Return the hash for future use
end).map do |day, value| 
    {date: day, avg: value.mean} # Clean data
end

will merge your objects into arrays grouped by date:

{:date=>"2015-11-14", :avg=>123046.04444444446}

Upvotes: 1

Related Questions