Reputation: 4009
It's a vague question I know....but the performance on this block of code is horrible. It takes about 15secs from the original post to the action to rendering the page...
The purpose of this action is to retrieve all Occupations from a CV, all the skills from that CV and the occupations. They need to be organized in 2 arrays:
the first array contains all the Occupations (no duplicates) and has them ordered according their score. Fo each double entry found the score is increased by 1
the second array contains ALL the skills from both the occupation array and the cv. Again no doubles are allowed, but for every double encountered the score of the existing is increased by one.
Below is the code block that performs this operation. It's relatively big compared to my other code snippets, but i hope it's understandable. I know working with the arrays like i do is confusing, but here is what each array location means:
position 3 : the location found in the cv
def categorize
@cv = Cv.find(params[:cv_id], :include => [:desired_occupations, :past_occupations, :educational_skills]) @menu = :second @language = Language.resolve(:code => :en, :name => :en) @occupation_hashes = [] @skill_hashes = [] (@cv.desired_occupations + @cv.past_occupations).each do |occupation| section = [] section << 'Desired occupation' if @cv.desired_occupations.include? occupation section << 'Work experience' if @cv.past_occupations.include? occupation unless (array = @occupation_hashes.assoc(occupation)).blank? array[1] += 1 array[2] = (array[2] & section).uniq else @occupation_hashes << [occupation, 1, section] end occupation.skills.each do |skill| unless (array = @skill_hashes.assoc skill).blank? label = occupation.concept.label(@language).value array[1]+= 1 array[3] << label unless array[3].include? label else @skill_hashes << [skill, 1, [], [occupation.concept.label(@language).value]] end end end @cv.educational_skills.each do |skill| unless (array = @skill_hashes.assoc skill).blank? array[1]+= 1 array[3] << 'Education skills' unless array[3].include? 'Education skills' else @skill_hashes << [skill, 1, ['Education skills'], []] end end # Sort the hashes @occupation_hashes.sort! { |x,y| y[1] <=> x[1]} @skill_hashes.sort! { |x,y| y[1] <=> x[1]} @max = @skill_hashes.first[1] @min = @skill_hashes.last[1] end
I can post the additional models and migrations to make it clear what each class does, but I think the first few lines of the above script should be clear on the associations. I'm looking for a way to optimize the each-loops...
Upvotes: 0
Views: 146
Reputation: 14736
You should do a but of profiling in your code to see what is taking a large chunk of time. You can figure out how to work on of the profilers, or just sprinkle some simple puts
or logger.info
statements throughout your code with a timestamp. Probably easiest to do this by using Benchmark
. Note: you may need to require 'benchmark'
... not sure if it is auto required in Rails or not.
For a single line, you can do something like this:
logger.info Benchmark.measure { @cv = Cv.find(params[:cv_id], :include => [:desired_occupations, :past_occupations, :educational_skills]) }
And for timing larger blocks of code:
logger.info Benchmark.measure do
(@cv.desired_occupations + @cv.past_occupations).each do |occupation|
section = []
section << 'Desired occupation' if @cv.desired_occupations.include? occupation
section << 'Work experience' if @cv.past_occupations.include? occupation
unless (array = @occupation_hashes.assoc(occupation)).blank?
array[1] += 1
array[2] = (array[2] & section).uniq
else
@occupation_hashes << [occupation, 1, section]
end
end
end
I'd just start with large blocks and then narrow it down. Not knowing how large of a dataset you are dealing with, it is hard to say what the problem zone is.
I'll also concur with others that you will be way better off to break this thing into smaller methods. This will also make it easier to test for performance, as you can do things like:
Benchmark.measure { 10000.times { foo.do_that_thing_that_might_be_slow }}
Upvotes: 1
Reputation: 211740
That's quite the block of code there. Generally if you're writing methods that serious you're going to have trouble maintaining it in the future. A technique that would help is breaking up that monolithic chunk of code and turning it into a helper class that does the processing in more logical stages, making it easier to fine-tune aspects of it.
For instance, an interface might be:
@categorizer = CvCategorizer.new(params[:cv_id])
This would encapsulate all of the above and save it into instance variables made accessible by being declared with attr_reader
.
Using a utility class means you can break up the initialization into steps that are made more clear:
def initialize(cv_id)
# Call a wrapper method that loads the CV
@cv = self.load_cv(cv_id)
# Perform discrete steps to re-order the imported data
self.organize_occupations
self.organize_skills
end
It's really hard to say why this is slow by just looking at it, though I would pay very close attention to log/development.log
to see what's going on in there. It could be the initial load is painfully slow but the rest of the method is fine.
Upvotes: 1