JoaoPhelippe
JoaoPhelippe

Reputation: 57

Iterating over big arrays with limited memory and time of execution

I’m having trouble using Ruby to pass some tests that make the array too big and return an error.

Solution.rb: failed to allocate memory (NoMemoryError)

I have failed to pass it twice.

The problem is about scheduling meetings. The method receives two parameters in order: a matrix with all the first days that investors can meet in the company, and a matrix with all the last days.

For example:

firstDay = [1,5,10]
lastDay = [4,10,10]

This shows that the first investor will be able to find himself between the days 1..4, the second between the days 5..10 and the last one in 10..10.

I need to return the largest number of investors that the company will serve. In this case, all of them can be attended to, the first one on day 1, the second one on day 5, and the last one on day 10.

So far, the code works normally, but with some hidden tests with at least 1000 investors, the error I mentioned earlier appears.

Is there a best practice in Ruby to handle this?

My current code is:

def countMeetings(firstDay, lastDay)
    GC::Profiler.enable
    GC::Profiler.clear

    first = firstDay.sort.first
    last = lastDay.sort.last
    available = []
    
    #Construct the available days for meetings
    firstDay.each_with_index do |d, i|
        available.push((firstDay[i]..lastDay[i]).to_a)
    end
    available = available.flatten.uniq.sort
    investors = {}
    attended_day = []
    attended_investor = []

    #Construct a list of investor based in their first and last days
    firstDay.each_index do |i|
        investors[i+1] = (firstDay[i]..lastDay[i]).to_a
    end

    for day in available
        investors.each do |key, value|
            next if attended_investor.include?(key)
            if value.include?(day)
                next if attended_day.include?(day)
                attended_day.push(day)
                attended_investor.push(key)
            end
        end
    end

    attended_investor.size
end

Using Lazy as far as I could understand, I escaped the MemoryError, but I started receiving a runtime error:

Your code was not executed on time. Allowed time: 10s

And my code look like this:

def countMeetings(firstDay, lastDay)

    loop_size = firstDay.size
    first = firstDay.sort.first
    last = lastDay.sort.last

    daily_attendance = {}
   
    (first..last).each do |day|
        for ind in 0...loop_size
            (firstDay[ind]..lastDay[ind]).lazy.each do |investor_day|
                next if daily_attendance.has_value?(ind)
                if investor_day == day
                    daily_attendance[day] = ind
                end
            end
        end
    end

    daily_attendance.size
end

And it went through the cases with few investors. I thought about using multi-thread and the code became the following:

 def countMeetings(firstDay, lastDay)

    loop_size = firstDay.size
    first = firstDay.sort.first
    last = lastDay.sort.last
    threads = []
    daily_attendance = {}
   
    (first..last).lazy.each_slice(25000) do |slice|
        slice.each do |day|
            threads << Thread.new do
                for ind in 0...loop_size
                    (firstDay[ind]..lastDay[ind]).lazy.each do |investor_day|
                        next if daily_attendance.has_value?(ind)
                        if investor_day == day
                            daily_attendance[day] = ind
                        end
                    end
                end
            end
        end
    end
    threads.each{|t| t.join}
    daily_attendance.size
end

Unfortunately, it went back to the MemoryError.

Upvotes: 1

Views: 261

Answers (1)

Schwern
Schwern

Reputation: 165218

This can be done without consuming any more memory than the range of days. The key is to avoid Arrays and keep things as Enumerators as much as possible.

First, rather than the awkward pair of Arrays that need to be converted into Ranges, pass in an Enumerable of Ranges. This both simplifies the method, and it allows it to be Lazy if the list of ranges is very large. It could be read from a file, fetched from a database or an API, or generated by another lazy enumerator. This saves you from requiring big arrays.

Here's an example using an Array of Ranges.

p count_meetings([(1..4), (5..10), (10..10)])

Or to demonstrate transforming your firstDay and lastDay Arrays into a lazy Enumerable of Ranges...

firstDays = [1,5,10]
lastDays = [4,10,10]
p count_meetings(
  firstDays.lazy.zip(lastDays).map { |first,last|
    (first..last)
  }
)

firstDays.lazy makes everything that comes after lazy. .zip(lastDays) iterates through both Arrays in pairs: [1,4], [5,10], and [10,10]. Then we turn them into Ranges. Because it's lazy it will only map them as needed. This avoids making another big Array.


Now that's fixed, all we need to do is iterate over each Range and increment their attendance for the day.

def count_meetings(attendee_ranges)
  # Make a Hash whose default values are 0.
  daily_attendance = Hash.new(0)

  # For each attendee
  attendee_ranges.each { |range|
    # For each day they will attend, add one to the attendance for that day.
    range.each { |day| daily_attendance[day] += 1 }
  }
  
  # Get the day/attendance pair with the maximum value, and only return the value.
  daily_attendance.max[1]
end

Memory growth is limited to how big the day range is. If the earliest attendee is on day 1 and the last is on day 1000 daily_attendance is just 1000 entries which is a long time for a conference.

And since you've built the whole Hash anyway, why waste it? Write one function that returns the full attendance, and another that extracts the max.

def count_meeting_attendance(attendee_ranges)
  daily_attendance = Hash.new(0)

  attendee_ranges.each { |range|
    range.each { |day| daily_attendance[day] += 1 }
  }
  
  return daily_attendance
end

def max_meeting_attendance(*args)
  count_meeting_attendance(*args).max[1]
end

Since this is an exercise and you're stuck with the wonky arguments, we can do the same trick and lazily zip firstDays and lastDays together and turn them into Ranges.

def count_meeting_attendance(firstDays, lastDays)
  attendee_ranges = firstDays.lazy.zip(lastDays).map { |first,last|
    (first..last)
  }

  daily_attendance = Hash.new(0)
  attendee_ranges.each { |range|
    range.each { |day| daily_attendance[day] += 1 }
  }
  
  return daily_attendance
end

Upvotes: 4

Related Questions