Reputation: 2269
I am parsing a large CSV file in a ruby script and need to find the closest match for a title from some search keys. The search keys maybe one or more values and the values may not exactly match as per below (should be close)
search_keys = ["big", "bear"]
A large array containing data that I need to search through, only want to search on the title
column:
array = [
["id", "title", "code", "description"],
["1", "once upon a time", "3241", "a classic story"],
["2", "a big bad wolf", "4235", "a little scary"],
["3", "three big bears", "2626", "a heart warmer"]
]
In this case I would want it to return the row ["3", "three big bears", "2626", "a heart warmer"]
as this is the closest match to my search keys.
I want it to return the closest match from the search keys given.
Is there any helpers/libraries/gems I can use? Anyone done this before??
Upvotes: 5
Views: 1171
Reputation: 42182
Here is my one-line shot
p array.find_all {|a|a.join.scan(/#{search_keys.join("|")}/).length==search_keys.length}
=>[["3", "three big bears", "2626", "a heart warmer"]]
to get all the rows in order of number of matches
p array.drop(1).sort_by {|a|a.join.scan(/#{search_keys.join("|")}/).length}.reverse
Anyone knows how to combine the last solution so that the rows that contain none of the keys are dropped and to keep it concise as is ?
Upvotes: 1
Reputation: 1719
You could probably write it in a more succinct way...
array = [
["id", "title", "code", "description"],
["1", "once upon a time", "3241", "a classic story"],
["2", "a big bad wolf", "4235", "a little scary"],
["3", "three big bears", "2626", "a heart warmer"]
]
search_keys = ["big", "bear"]
def sift(records, target_field, search_keys)
# find target_field index
target_field_index = nil
records.first.each_with_index do |e, i|
if e == target_field
target_field_index = i
break
end
end
if target_field_index.nil?
raise "Target field was not found"
end
# sums up which records have a match and how many keys they match
# key => val = record => number of keys matched
counter = Hash.new(0) # each new hash key is init'd with value of 0
records.each do |record| # look at all our given records
search_keys.each do |key| # check each search key on the field
if record[target_field_index].include?(key)
counter[record] += 1 # found a key, init to 0 if required and increment count
end
end
end
# find the result with the most search key matches
top_result = counter.to_a.reduce do |top, record|
if record[1] > top[1] # [0] = record, [1] = key hit count
top = record # set to new top
end
top # continue with reduce
end.first # only care about the record (not the key hit count)
end
puts "Top result: #{sift array, 'title', search_keys}"
# => Top result: ["3", "three big bears", "2626", "a heart warmer"]
Upvotes: 1
Reputation: 4877
This works. Will find and return an array of matched* rows as result
.
*matched rows = a row where the id, title, code or description match ANY of the provided seach_keys. incl partial searches such as 'bear' in 'bears'
result = []
array.each do |a|
a.each do |i|
search_keys.each do |k|
result << a if i.include?(k)
end
end
end
result.uniq!
Upvotes: 1
Reputation: 2349
I am worried, this task should be handled to any search engine at db level or similar, no point fetching data in app and do searching across columns/rows etc, should be expensive. but for now here is the plain simple approach :)
array = [
["id", "title", "code", "description"],
["1", "once upon a time", "3241", "a classic story"],
["2", "a big bad wolf", "4235", "a little scary"],
["3", "three big bears", "2626", "a heart warmer"]
]
h = {}
search_keys = ["big", "bear"]
array[1..-1].each do |rec|
rec_id = rec[0].to_i
search_keys.each do |key|
if rec[1].include? key
h[rec_id] = h[rec_id] ? (h[rec_id]+1) : 1
end
end
end
closest = h.keys.first
h.each do |rec, count|
closest = rec if h[closest] < h[rec]
end
array[closest] # => desired output :)
Upvotes: 2
Reputation: 1294
I think you can do it by your self and no need to use any gems! This may be close to what you need; searching in the array for the keys and set a rank for each found element.
result = []
array.each do |ar|
rank = 0
search_keys.each do |key|
if ar[1].include?(key)
rank += 1
end
end
if rank > 0
result << [rank, ar]
end
end
This code can be written better than the above, but i wanted to show you the details.
Upvotes: 1