CISCO
CISCO

Reputation: 539

Ruby and RegExp

Sorry if this has already been asked.

My current code is:

  directory = "disease"     #Creates a directory called heart attacks
  FileUtils.mkpath(directory)   # Makes the directory if it doesn't exists

  cancer = Eightk.where("text ilike '%cancer%'")
  died = Eightk.where("text ilike '%died%'")

  cancer.each do |filing|   #filing can be used instead of eightks
  filename = "#{directory}/#{filing.doc_id}.html"
  File.open(filename,"w").puts filing.text
  puts "Storing #{filing.doc_id}..."


  died.each do |filing|     #filing can be used instead of eightks
  filename = "#{directory}/#{filing.doc_id}.html"
  File.open(filename,"w").puts filing.text
  puts "Storing #{filing.doc_id}..."

  end

end

But this is not working for the following

So I have tried using Regexp.union as follows but am a bit lost

    directory = "disease"       #Creates a directory called heart attacks
    FileUtils.mkpath(directory)     # Makes the directory if it doesn't exists


    keywords = [/dead/,/killed/,/cancer/]

    re = regexp.union(keywords)

So I am trying to search the text files for these keywords and then copy the text documents.

Any help is really appreciated.

Upvotes: 1

Views: 69

Answers (1)

zdk
zdk

Reputation: 1576

Since you said:

I have about 1 million text documents contained in psql

and use "iLike" text search operator to search words in those documents.

IMHO, that is an inefficient implementation because your data is huge, your query will process all 1 million text documents for every search and it will be very slow.

Before moving forward, I think you should take a look at PG Full Text Searching first. (if you simply want to use built-in full text search in PG) or you could also take a look at some other products like elasticsearch, solr etc. that are dedicated to text search problem.

Regarding PG full text search, in Ruby, you could use pg_serach gem. Though, if you use Rails, I wrote a post about simple full text search implementaion with PG in Rails.

I hope you may find this useful.

Upvotes: 1

Related Questions