Ilja KO
Ilja KO

Reputation: 1616

How to make my DB query method in Rails more efficient

I am doing a query over my POSTGRESQL DB. My app has Articles and the Articles can have a number of Hashtags. Those relations are saved in a joined table of Hashtags and Articles.

I have a working method which gives me back Articles which have certain hashtags, or gives me back all articles who do not contain certain hashtags

  def test(hashtags, include = true)
    articles= []
    hashtags.split(' ').each do |h|
      articles+= Article.joins(:hashtags).where('LOWER(hashtags.value) LIKE LOWER(?)', "#{h}")
    end
    if include
      articles.uniq
    else 
      (Article.all.to_set - articles.uniq.to_set).to_a
    end
  end

I could call it like this:

test("politics people china", true)

And it would give me all Articles who have one of those hashtags related to

Or I could call it like that

test("politics people china", false)

And it would give me all Articles EXCEPT those who have one of these hashtags

It works well, but I dont think this is very efficient as I do so much in Ruby and not on DB level.

I tried this:

def test2(hashtags, include = true)
    articles= []
    pattern = ''
    hashtags.split(' ').each do |h|
      pattern += "#{h}|"
    end
    pattern = '(' + pattern[0...-1] + ')'

    if include
      articles = Article.joins(:hashtags).where('hashtags.value ~* ?', "#{pattern}")
    else 
      articles = Article.joins(:hashtags).where('hashtags.value !~* ?', "#{pattern}")
    end

    articles.uniq
  end

But it does not behave like I thought it would. First of all if I call it like that:

test2("politics china", true)

It wouldn't only give me all Articles who have a hashtags politics or china, but also all artcles who have a hashtag containing one of the letters in politics or china like so:

(p|o|l|i|t|c|s|h|n|a)

but it should check for this actually, and the pattern looks actually like this, what I can see in the console:

(politics|china)

which it doesnt what I find is strange tbh...

And with

test2("politics", false)

It only gives me articles who have one or more hashtags associated to, BUT leaves out the ones who have no hashtag at all

Can someone help me make my working method more efficient?

EDIT: Here is my updated code like suggested in an answer

def test2(hashtags, include = false)    
    hashtags = 
    if include 
      Hashtag.where("LOWER(value) iLIKE ANY ( array[?] )", hashtags)
    else
      Hashtag.where("LOWER(value) NOT iLIKE ANY ( array[?] )", hashtags)
    end
    Slot.joins(:hashtags).merge(hashtags).distinct
  end

It still lacks to give me Articles who have NO hashtags at all if incude is false unfortunately

Upvotes: 1

Views: 98

Answers (2)

Ilya Konyukhov
Ilya Konyukhov

Reputation: 2791

You are right about

I dont think this is very efficient as I do so much in Ruby and not on DB level.

ActiveRecord works nice for simple queries, but when things are getting complex it's reasonable to use plain SQL. So let's try to build a query that matches your test cases:

1) For this call test("politics people china", true) the query may look like:

SELECT DISTINCT ON (AR.id) AR.*
FROM articles AR
  JOIN articles_hashtags AHSH ON AHSH.article_id = AR.id
  JOIN hashtags HSH ON HSH.id = AHSH.hashtag_id
WHERE LOWER(HSH.value) IN ('politics', 'people', 'china')
ORDER BY AR.id;

(I'm not sure how your join table is named, so assuming it is articles_hashtags).

Plain and simple: we take data from articles table using 2 inner joins with articles_hashtags and hashtags and where conditions, which filters hashtags we want to see; and eventually it brings us all articles with that hashtags. No matter on how many hashtags we want to filter: IN statement works well even if there is only one hashtag in the list.

Please note DISTINCT ON: it's necessary for removing duplicate articles from resultset, in case the same article has more than one hashtag from given hashtag list.

2) For the call test("politics people china", false) the query is a bit more complex. It needs to exclude articles which have given hashtags. Hence it should return articles with different hashtags, as well as articles without hashtags at all. Trying to keep things simple we could use the previous query for that:

SELECT A.*
FROM articles A
WHERE A.id NOT IN (
    SELECT DISTINCT ON (AR.id) AR.id
    FROM articles AR
      JOIN articles_hashtags AHSH ON AHSH.article_id = AR.id
      JOIN hashtags HSH ON HSH.id = AHSH.hashtag_id
    WHERE LOWER(HSH.value) IN ('politics', 'people', 'china')
    ORDER BY AR.id
);

Here we're fetching all articles, but those who have any of given hashtags.

3) Converting these queries to a Ruby method gives us the following:

def test3(hashtags, include = true)
  # code guard to prevent SQL-error when there are no hashtags given
  if hashtags.nil? || hashtags.strip.blank?
    return include ? [] : Article.all.to_a
  end

  basic_query = "
    SELECT DISTINCT ON (AR.id) AR.*
    FROM #{Article.table_name} AR
      JOIN articles_hashtags AHSH ON AHSH.article_id = AR.id
      JOIN #{Hashtag.table_name} HSH ON HSH.id = AHSH.hashtag_id
    WHERE LOWER(HSH.value) IN (:hashtags)
    ORDER BY AR.id"

  query = if include
            basic_query
          else
            "SELECT A.*
            FROM #{Article.table_name} A
            WHERE A.id NOT IN (#{basic_query.sub('AR.*', 'AR.id')})"
          end

  hashtag_arr = hashtags.split(' ').map(&:downcase) # to convert hashtags string into a list

  Article.find_by_sql [query, { hashtags: hashtag_arr }]
end

The method above will return an array of articles matching your conditions, empty or not.

Upvotes: 2

Rodrigo
Rodrigo

Reputation: 4802

Try this:

def test(hashtags, include = true)
  hashtags = 
    if include 
      Hashtag.where("LOWER(value) iLIKE ANY ( array[?] )", hashtags)
    else
      Hashtag.where("LOWER(value) NOT iLIKE ANY ( array[?] )", hashtags)
    end
  Article.joins(:hashtags).merge(hashtags).distinct
end

Upvotes: 1

Related Questions