Reputation: 539
Sorry if this has already been asked.
My current code is:
directory = "disease" #Creates a directory called heart attacks
FileUtils.mkpath(directory) # Makes the directory if it doesn't exists
cancer = Eightk.where("text ilike '%cancer%'")
died = Eightk.where("text ilike '%died%'")
cancer.each do |filing| #filing can be used instead of eightks
filename = "#{directory}/#{filing.doc_id}.html"
File.open(filename,"w").puts filing.text
puts "Storing #{filing.doc_id}..."
died.each do |filing| #filing can be used instead of eightks
filename = "#{directory}/#{filing.doc_id}.html"
File.open(filename,"w").puts filing.text
puts "Storing #{filing.doc_id}..."
end
end
But this is not working for the following
Doesn't match the exact word
Is very time consuming since it contains lots of coping the same code and changing just one word.
So I have tried using Regexp.union as follows but am a bit lost
directory = "disease" #Creates a directory called heart attacks
FileUtils.mkpath(directory) # Makes the directory if it doesn't exists
keywords = [/dead/,/killed/,/cancer/]
re = regexp.union(keywords)
So I am trying to search the text files for these keywords and then copy the text documents.
Any help is really appreciated.
Upvotes: 1
Views: 69
Reputation: 1576
Since you said:
I have about 1 million text documents contained in psql
and use "iLike" text search operator to search words in those documents.
IMHO, that is an inefficient implementation because your data is huge, your query will process all 1 million text documents for every search and it will be very slow.
Before moving forward, I think you should take a look at PG Full Text Searching first. (if you simply want to use built-in full text search in PG) or you could also take a look at some other products like elasticsearch, solr etc. that are dedicated to text search problem.
Regarding PG full text search, in Ruby, you could use pg_serach gem. Though, if you use Rails, I wrote a post about simple full text search implementaion with PG in Rails.
I hope you may find this useful.
Upvotes: 1