reading from disk multiple times possibly cause bottleneck

Question

I'm trying to find out where the bottleneck of a ruby script is. I suspect that it might happen because the script parses thousands of lines and, for each one, it checks if a certain file is present in disk and eventually reads its contents.

def sectionsearch(brand, season, video)
  mytab.trs.each_with_index do |row, i|

    # ...some code goes here...
    f = "modeldesc/" + brand.downcase + "/" + modelcode + ".html"                  
    if File.exist?(f)
      modeldesc = File.read(f)                                                     
    else                                                                           
      modeldesc = ""                                                               
    end 
    # ...more code here...

  end 
end

Given that there are no more than 30 different "modelcode" files for thousands of record, I was looking for a different approach that reads all the content of the folder before the each loop (since it is not going to change during the execution).

Is this approach going to speed up my script, also is this the right way to implement this?

Doon · Accepted Answer

I would probably do something like a hash (passing a block) to check for the file, on unknown keys:

def sectionsearch(brand, season, video)

   modeldescrs = Hash.new do |cache, model|
      if File.exist?(model)
        cache[model] = File.read(model)
      else
        cache[model] = ''
      end
    end

  mytab.trs.each_with_index do |row, i|

    # ...some code goes here...
    f = "modeldesc/" + brand.downcase + "/" + modelcode + ".html"                  
     puts modeldescrs[f]
    # ...more code here...

  end 
end

then just access modeldescrs[f] when you need it (the puts above is an example) if the key doesn't exist the block will be executed and it will look it up / populate it. see http://www.ruby-doc.org/core-2.0/Hash.html for more info on the block form of the initializer for Hash

Also you could make modeldescrs an instance variable if it needs to be saved.

reading from disk multiple times possibly cause bottleneck

Answers (1)

Related Questions