Peter Brown
Peter Brown

Reputation: 51697

Best way to fill in gaps within multidimensional array in Ruby

I have a multi-dimensional array similar to the example below that I want to group together using Ruby's zip method. I have it working fine when each inner array has the same number of elements, but am running into problems when they are different lengths.

In the example below, the second set is missing a record at 00:15. How would I fill in this missing record?

What am I considering a gap?

It's the timestamp that constitutes a gap. Take a look at my first code sample where I have a comment about the gap being at 00:15. All the other arrays have a hash with this timestamp, so I consider this to be a "missing record" or "gap". The timestamp really could be some other unique string so the fact that they are 15 minutes apart is irrelevant. The values are also irrelevant.

The only approach that comes to mind involves looping over the arrays twice. The first time would be to build an array of uniq timestamps, and the second time would be to fill in the missing record(s) where the timestamp are not present. I'm comfortable coding this approach, but it seems a little hacky and Ruby always seems to surprise me with an elegant and concise solution.

I start with this:

values = [
  [
    {:timestamp => "2011-01-01 00:00", :value => 1},
    {:timestamp => "2011-01-01 00:15", :value => 2},
    {:timestamp => "2011-01-01 00:30", :value => 3}
  ],
  [ # There's a gap here at 00:15
    {:timestamp => "2011-01-01 00:00", :value => 1},
    {:timestamp => "2011-01-01 00:30", :value => 3}
  ],
  [
    {:timestamp => "2011-01-01 00:00", :value => 1},
    {:timestamp => "2011-01-01 00:15", :value => 2},
    {:timestamp => "2011-01-01 00:30", :value => 3}
  ]
]

I want to end with this:

values = [
  [
    {:timestamp => "2011-01-01 00:00", :value => 1},
    {:timestamp => "2011-01-01 00:15", :value => 2},
    {:timestamp => "2011-01-01 00:30", :value => 3}
  ],
  [ # The gap has been filled with a nil value
    {:timestamp => "2011-01-01 00:00", :value => 1},
    {:timestamp => "2011-01-01 00:15", :value => nil},
    {:timestamp => "2011-01-01 00:30", :value => 3}
  ],
  [
    {:timestamp => "2011-01-01 00:00", :value => 1},
    {:timestamp => "2011-01-01 00:15", :value => 2},
    {:timestamp => "2011-01-01 00:30", :value => 3}
  ]
]

When all the arrays are the same size, values.transpose will produce:

[
  [
   {:value=>1, :timestamp=>"2011-01-01 00:00"}, 
   {:value=>1, :timestamp=>"2011-01-01 00:00"}, 
   {:value=>1, :timestamp=>"2011-01-01 00:00"}
  ], 
  [
    {:value=>2, :timestamp=>"2011-01-01 00:15"}, 
    {:value=>nil, :timestamp=>"2011-01-01 00:15"},
    {:value=>2, :timestamp=>"2011-01-01 00:15"}
  ], 
  [
    {:value=>3, :timestamp=>"2011-01-01 00:30"}, 
    {:value=>3, :timestamp=>"2011-01-01 00:30"}, 
    {:value=>3, :timestamp=>"2011-01-01 00:30"}
  ]
]

Upvotes: 3

Views: 840

Answers (3)

Brian Armstrong
Brian Armstrong

Reputation: 19863

Also checkout Array#in_groups_of if you're using Rails

%w(1 2 3 4 5 6 7).in_groups_of(3) {|g| p g}
["1", "2", "3"]
["4", "5", "6"]
["7", nil, nil]

http://weblog.rubyonrails.org/2006/3/1/new-in-rails-enumerable-group_by-and-array-in_groups_of

Upvotes: 0

Phrogz
Phrogz

Reputation: 303205

Here's a working solution; it finds all timestamps, finds the missing timestamps in each set, and then injects them. See comments after the solution for a small improvement you could make with Ruby 1.9.2:

values = [[
  {:timestamp => "2011-01-01 00:00", :value => 1},
  {:timestamp => "2011-01-01 00:15", :value => 2},
  {:timestamp => "2011-01-01 00:30", :value => 3}
],[
  {:timestamp => "2011-01-01 00:00", :value => 1},
  {:timestamp => "2011-01-01 00:30", :value => 3}
],[
  {:timestamp => "2011-01-01 00:00", :value => 1},
  {:timestamp => "2011-01-01 00:15", :value => 2},
  {:timestamp => "2011-01-01 00:30", :value => 3}
]]

all_stamps = values.flatten.map{|x| x[:timestamp]}.uniq.sort
values.each do |set|
  my_stamps = set.map{ |x| x[:timestamp] }.uniq
  missing   = all_stamps - my_stamps
  set.concat( missing.map{ |stamp| {timestamp:stamp, value:nil} } )
  set.replace( set.sort_by{ |x| x[:timestamp] } )
end

require 'pp'
pp values
#=> [[{:timestamp=>"2011-01-01 00:00", :value=>1},
#=>   {:timestamp=>"2011-01-01 00:15", :value=>2},
#=>   {:timestamp=>"2011-01-01 00:30", :value=>3}],
#=>  [{:timestamp=>"2011-01-01 00:00", :value=>1},
#=>   {:timestamp=>"2011-01-01 00:15", :value=>nil},
#=>   {:timestamp=>"2011-01-01 00:30", :value=>3}],
#=>  [{:timestamp=>"2011-01-01 00:00", :value=>1},
#=>   {:timestamp=>"2011-01-01 00:15", :value=>2},
#=>   {:timestamp=>"2011-01-01 00:30", :value=>3}]]

With Ruby 1.9.2 you can replace set.replace( set.sort_by{...} ) with simply set.sort_by!{ ... }. Note also that I've assumed you're using Ruby 1.9 in my hash literal (seen in missing.map...).

Upvotes: 1

Ben Lee
Ben Lee

Reputation: 53319

The approach you outlined is correct, but it turns out ruby is very well suited for doing that kind of approach elegantly. This would do it, for example:

stamps = values.map{ |logs| logs.map{ |row| row[:timestamp] } }.flatten.uniq.sort
values.map!{ |logs| stamps.map { |ts| logs.select{ |row| row[:timestamp] == ts }.first || { :timestamp => ts, :value => nil } } }

The first line gets a list of unique timestamps (maps all the logs into just arrays of timestamps, flattens the arrays into a single array, keeps only uniques, and sorts the timestamps).

The second line fills in the gaps (loops through the logs, and for each timestamp in that log use what's there if there's something there, otherwise insert the new nil-valued row).

Upvotes: 1

Related Questions