Reputation: 1596
I'm trying to work out the most efficient way to loop through some deeply nested data, find the average of the values and return a new hash with the data grouped by the date.
The raw data looks like this:
[
client_id: 2,
date: "2015-11-14",
txbps: {
"22"=>{
"43"=>17870.153846153848,
"44"=>15117.866666666667
}
},
client_id: 1,
date: "2015-11-14",
txbps: {
"22"=>{
"43"=>38113.846153846156,
"44"=>33032.0
}
},
client_id: 4,
date: "2015-11-14",
txbps: {
"22"=>{
"43"=>299960.0,
"44"=>334182.4
}
},
]
I have about 10,000,000 of these to loop through so I'm a little worried about performance.
The end result, needs to look like this. The vals need to be the average of the txbps:
[
{
date: "2015-11-14",
avg: 178730.153846153848
},
{
date: "2015-11-15",
avg: 123987.192873978987
},
{
date: "2015-11-16",
avg: 126335.982123876283
}
]
I've tried this to start:
results.map { |val| val["txbps"].values.map { |a| a.values.sum } }
But that's giving me this:
[[5211174.189281798, 25998.222222222223], [435932.442835184, 56051.555555555555], [5718452.806735582, 321299.55555555556]]
And I just can't figure out how to get it done. I can't find any good references online either.
I also tried to group by the date first:
res.map { |date, values| values.map { |client| client["txbps"].map { |tx,a| { date: date, client_id: client[':'], tx: (a.values.inject(:+) / a.size).to_i } } } }.flatten
[
{
: date=>"2015-11-14",
: client_id=>"2",
: tx=>306539
},
{
: date=>"2015-11-14",
: client_id=>"2",
: tx=>25998
},
{
: date=>"2015-11-14",
: client_id=>"2",
: tx=>25643
},
{
: date=>"2015-11-14",
: client_id=>"2",
: tx=>56051
},
{
: date=>"2015-11-14",
: client_id=>"1",
: tx=>336379
},
{
: date=>"2015-11-14",
: client_id=>"1",
: tx=>321299
}
]
If possible, how can I do this in a single run.
---- EDIT ----
Got a little bit further:
res.map { |a,b|
{
date: a[:date], val: a["txbps"].values.map { |k,v|
k.values.sum / k.size
}.first
}
}.
group_by { |el| el[:date] }.map { |date,list|
{
key: date, val: list.map { |elem| elem[:val] }.reduce(:+) / list.size
}
}
But that's epic - is there a faster, simpler way??
Upvotes: 0
Views: 629
Reputation: 110755
Data Structure
I assume your input data is an array of hashes. For example:
arr = [
{
client_id: 2,
date: "2015-11-14",
txbps: {
"22"=>{
"43"=>17870.15,
"44"=>15117.86
}
}
},
{
client_id: 1,
date: "2015-11-15",
txbps: {
"22"=>{
"43"=>38113.84,
"44"=>33032.03,
}
}
},
{
client_id: 4,
date: "2015-11-14",
txbps: {
"22"=>{
"43"=>299960.0,
"44"=>334182.4
}
}
},
{
client_id: 3,
date: "2015-11-15",
txbps: {
"22"=>{
"43"=>17870.15,
"44"=>15117.86
}
}
}
]
Code
Based on my understanding of the problem, you can compute averages as follows:
def averages(arr)
h = arr.each_with_object(Hash.new { |h,k| h[k] = [] }) { |g,h|
g[:txbps].values.each { |f| h[g[:date]].concat(f.values) } }
h.merge(h) { |_,v| (v.reduce(:+)/(v.size.to_f)).round(2) }
end
Example
For arr
above:
avgs = averages(arr)
#=> {"2015-11-14"=>166782.6, "2015-11-15"=>26033.47}
The value of the hash h
in the first line of the method was:
{"2015-11-14"=>[17870.15, 15117.86, 299960.0, 334182.4],
"2015-11-15"=>[38113.84, 33032.03, 17870.15, 15117.86]}
Convert hash returned by averages
to desired array of hashes
avgs
is not in the form of the output desired. It's a simple matter to do the conversion, but you might consider leaving the hash output in this format. The conversion is simply:
avgs.map { |d,avg| { date: d, avg: avg } }
#=> [{:date=>"2015-11-14", :avg=>166782.6},
# {:date=>"2015-11-15", :avg=>26033.47}]
Explanation
Rather than explain in detail how the method works, I will instead give an alternative form of the method does exactly the same thing, but in a more verbose and slightly less Ruby-like way. I've also included the conversion of the hash to an array of hashes at the end:
def averages(arr)
h = {}
arr.each do |g|
vals = g[:txbps].values
vals.each do |f|
date = g[:date]
h[date] = [] unless h.key?(date)
h[date].concat(f.values)
end
end
keys = h.keys
keys.each do |k|
val = h[k]
h[k] = (val.reduce(:+)/(val.size.to_f)).round(2)
end
h.map { |d,avg| { date: d, avg: avg } }
end
Now let me insert some puts
statements to print out various intermediate values in the calculations, to help explain what's going on:
def averages(arr)
h = {}
arr.each do |g|
puts "g=#{g}"
vals = g[:txbps].values
puts "vals=#{vals}"
vals.each do |f|
puts " f=#{f}"
date = g[:date]
puts " date=#{date}"
h[date] = [] unless h.key?(date)
puts " before concat, h=#{h}"
h[date].concat(f.values)
puts " after concat, h=#{h}"
end
puts
end
puts "h=#{h}"
keys = h.keys
puts "keys=#{keys}"
keys.each do |k|
val = h[k]
puts " k=#{k}, val=#{val}"
puts " val.reduce(:+)=#{val.reduce(:+)}"
puts " val.size.to_f=#{val.size.to_f}"
h[k] = (val.reduce(:+)/(val.size.to_f)).round(2)
puts " h[#{k}]=#{h[k]}"
puts
end
h.map { |d,avg| { date: d, avg: avg } }
end
Execute averages
once more:
averages(arr)
g={:client_id=>2, :date=>"2015-11-14", :txbps=>{"22"=>{"43"=>17870.15, "44"=>15117.86}}}
vals=[{"43"=>17870.15, "44"=>15117.86}]
f={"43"=>17870.15, "44"=>15117.86}
date=2015-11-14
before concat, h={"2015-11-14"=>[]}
after concat, h={"2015-11-14"=>[17870.15, 15117.86]}
g={:client_id=>1, :date=>"2015-11-15", :txbps=>{"22"=>{"43"=>38113.84, "44"=>33032.03}}}
vals=[{"43"=>38113.84, "44"=>33032.03}]
f={"43"=>38113.84, "44"=>33032.03}
date=2015-11-15
before concat, h={"2015-11-14"=>[17870.15, 15117.86], "2015-11-15"=>[]}
after concat, h={"2015-11-14"=>[17870.15, 15117.86], "2015-11-15"=>[38113.84, 33032.03]}
g={:client_id=>4, :date=>"2015-11-14", :txbps=>{"22"=>{"43"=>299960.0, "44"=>334182.4}}}
vals=[{"43"=>299960.0, "44"=>334182.4}]
f={"43"=>299960.0, "44"=>334182.4}
date=2015-11-14
before concat, h={"2015-11-14"=>[17870.15, 15117.86],
"2015-11-15"=>[38113.84, 33032.03]}
after concat, h={"2015-11-14"=>[17870.15, 15117.86, 299960.0, 334182.4],
"2015-11-15"=>[38113.84, 33032.03]}
g={:client_id=>3, :date=>"2015-11-15", :txbps=>{"22"=>{"43"=>17870.15, "44"=>15117.86}}}
vals=[{"43"=>17870.15, "44"=>15117.86}]
f={"43"=>17870.15, "44"=>15117.86}
date=2015-11-15
before concat, h={"2015-11-14"=>[17870.15, 15117.86, 299960.0, 334182.4],
"2015-11-15"=>[38113.84, 33032.03]}
after concat, h={"2015-11-14"=>[17870.15, 15117.86, 299960.0, 334182.4],
"2015-11-15"=>[38113.84, 33032.03, 17870.15, 15117.86]}
h={"2015-11-14"=>[17870.15, 15117.86, 299960.0, 334182.4],
"2015-11-15"=>[38113.84, 33032.03, 17870.15, 15117.86]}
keys=["2015-11-14", "2015-11-15"]
k=2015-11-14, val=[17870.15, 15117.86, 299960.0, 334182.4]
val.reduce(:+)=667130.41
val.size.to_f=4.0
h[2015-11-14]=166782.6
k=2015-11-15, val=[38113.84, 33032.03, 17870.15, 15117.86]
val.reduce(:+)=104133.87999999999
val.size.to_f=4.0
h[2015-11-15]=26033.47
#=> [{:date=>"2015-11-14", :avg=>166782.6},
# {:date=>"2015-11-15", :avg=>26033.47}]
Upvotes: 1
Reputation: 5667
Try #inject
Like .map
, It's a way of converting a enumerable (list, hash, pretty much anything you can loop in Ruby) into a different object. Compared to .map
, it's a lot more flexible, which is super helpful. Sadly, this comes with a cost of the method being super hard to wrap your head around. I think Drew Olson explains it best in his answer.
You can think of the first block argument as an accumulator: the result of each run of the block is stored in the accumulator and then passed to the next execution of the block. In the case of the code shown above, you are defaulting the accumulator, result, to 0. Each run of the block adds the given number to the current total and then stores the result back into the accumulator. The next block call has this new value, adds to it, stores it again, and repeats.
To sum all the numbers in an array (with #inject
), you can do this:
array = [5,10,7,8]
# |- Initial Value
array.inject(0) { |sum, n| sum + n } #=> 30
# |- You return the new value for the accumulator in this block.
To find the average of an array of numbers, you can find a sum, and then divide. If you divide the num
variable inside the inject function ({|sum, num| sum + (num / array.size)}
), you multiply the amount of calculations you will have to do.
array = [5,10,7,8]
array.inject(0.0) { |sum, num| sum + num } / array.size #=> 7.5
If creating methods on classes is your style, you can define a method on the Array
class (from John Feminella's answer). Put this code somewhere before you need to find the sum or mean of an array:
class Array
def sum
inject(0.0) { |result, el| result + el }
end
def mean
sum / size
end
end
And then
array = [5,10,7,8].sum #=> 30
array = [5,10,7,8].mean #=> 7.5
If you like putting code in black boxes, or really precious minerals, then you can use the average gem by fegoa89: gem install average
. It also has support for the #mode
and #median
[5,10,7,8].mean #=> 7.5
Assuming your objects look like this:
data = [
{
date: "2015-11-14",
...
txbps: {...},
},
{
date: "2015-11-14",
...
txbps: {...},
},
...
]
This code does what you need, but it's somewhat complex.
class Array
def sum
inject(0.0) { |result, el| result + el }
end
def mean
sum / size
end
end
data = (data.inject({}) do |hash, item|
this = (item[:txbps].values.map {|i| i.values}).flatten # Get values of values of `txbps`
hash[item[:date]] = (hash[item[:date]] || []) + this # If a list already exists for this date, use it, otherwise create a new list, and add the info we created above.
hash # Return the hash for future use
end).map do |day, value|
{date: day, avg: value.mean} # Clean data
end
will merge your objects into arrays grouped by date:
{:date=>"2015-11-14", :avg=>123046.04444444446}
Upvotes: 1