Reputation: 1616
I am doing a query over my POSTGRESQL DB. My app has Articles and the Articles can have a number of Hashtags. Those relations are saved in a joined table of Hashtags and Articles.
I have a working method which gives me back Articles which have certain hashtags, or gives me back all articles who do not contain certain hashtags
def test(hashtags, include = true)
articles= []
hashtags.split(' ').each do |h|
articles+= Article.joins(:hashtags).where('LOWER(hashtags.value) LIKE LOWER(?)', "#{h}")
end
if include
articles.uniq
else
(Article.all.to_set - articles.uniq.to_set).to_a
end
end
I could call it like this:
test("politics people china", true)
And it would give me all Articles who have one of those hashtags related to
Or I could call it like that
test("politics people china", false)
And it would give me all Articles EXCEPT those who have one of these hashtags
It works well, but I dont think this is very efficient as I do so much in Ruby and not on DB level.
I tried this:
def test2(hashtags, include = true)
articles= []
pattern = ''
hashtags.split(' ').each do |h|
pattern += "#{h}|"
end
pattern = '(' + pattern[0...-1] + ')'
if include
articles = Article.joins(:hashtags).where('hashtags.value ~* ?', "#{pattern}")
else
articles = Article.joins(:hashtags).where('hashtags.value !~* ?', "#{pattern}")
end
articles.uniq
end
But it does not behave like I thought it would. First of all if I call it like that:
test2("politics china", true)
It wouldn't only give me all Articles who have a hashtags politics
or china
, but also all artcles who have a hashtag containing one of the letters in politics
or china
like so:
(p|o|l|i|t|c|s|h|n|a)
but it should check for this actually, and the pattern looks actually like this, what I can see in the console:
(politics|china)
which it doesnt what I find is strange tbh...
And with
test2("politics", false)
It only gives me articles who have one or more hashtags associated to, BUT leaves out the ones who have no hashtag at all
Can someone help me make my working method more efficient?
EDIT: Here is my updated code like suggested in an answer
def test2(hashtags, include = false)
hashtags =
if include
Hashtag.where("LOWER(value) iLIKE ANY ( array[?] )", hashtags)
else
Hashtag.where("LOWER(value) NOT iLIKE ANY ( array[?] )", hashtags)
end
Slot.joins(:hashtags).merge(hashtags).distinct
end
It still lacks to give me Articles who have NO hashtags at all if incude
is false unfortunately
Upvotes: 1
Views: 98
Reputation: 2791
You are right about
I dont think this is very efficient as I do so much in Ruby and not on DB level.
ActiveRecord works nice for simple queries, but when things are getting complex it's reasonable to use plain SQL. So let's try to build a query that matches your test cases:
1) For this call test("politics people china", true)
the query may look like:
SELECT DISTINCT ON (AR.id) AR.*
FROM articles AR
JOIN articles_hashtags AHSH ON AHSH.article_id = AR.id
JOIN hashtags HSH ON HSH.id = AHSH.hashtag_id
WHERE LOWER(HSH.value) IN ('politics', 'people', 'china')
ORDER BY AR.id;
(I'm not sure how your join table is named, so assuming it is articles_hashtags
).
Plain and simple: we take data from articles
table using 2 inner joins with articles_hashtags
and hashtags
and where
conditions, which filters hashtags we want to see; and eventually it brings us all articles with that hashtags. No matter on how many hashtags we want to filter: IN
statement works well even if there is only one hashtag in the list.
Please note DISTINCT ON: it's necessary for removing duplicate articles from resultset, in case the same article has more than one hashtag from given hashtag list.
2) For the call test("politics people china", false)
the query is a bit more complex. It needs to exclude articles which have given hashtags. Hence it should return articles with different hashtags, as well as articles without hashtags at all. Trying to keep things simple we could use the previous query for that:
SELECT A.*
FROM articles A
WHERE A.id NOT IN (
SELECT DISTINCT ON (AR.id) AR.id
FROM articles AR
JOIN articles_hashtags AHSH ON AHSH.article_id = AR.id
JOIN hashtags HSH ON HSH.id = AHSH.hashtag_id
WHERE LOWER(HSH.value) IN ('politics', 'people', 'china')
ORDER BY AR.id
);
Here we're fetching all articles, but those who have any of given hashtags.
3) Converting these queries to a Ruby method gives us the following:
def test3(hashtags, include = true)
# code guard to prevent SQL-error when there are no hashtags given
if hashtags.nil? || hashtags.strip.blank?
return include ? [] : Article.all.to_a
end
basic_query = "
SELECT DISTINCT ON (AR.id) AR.*
FROM #{Article.table_name} AR
JOIN articles_hashtags AHSH ON AHSH.article_id = AR.id
JOIN #{Hashtag.table_name} HSH ON HSH.id = AHSH.hashtag_id
WHERE LOWER(HSH.value) IN (:hashtags)
ORDER BY AR.id"
query = if include
basic_query
else
"SELECT A.*
FROM #{Article.table_name} A
WHERE A.id NOT IN (#{basic_query.sub('AR.*', 'AR.id')})"
end
hashtag_arr = hashtags.split(' ').map(&:downcase) # to convert hashtags string into a list
Article.find_by_sql [query, { hashtags: hashtag_arr }]
end
The method above will return an array of articles matching your conditions, empty or not.
Upvotes: 2
Reputation: 4802
Try this:
def test(hashtags, include = true)
hashtags =
if include
Hashtag.where("LOWER(value) iLIKE ANY ( array[?] )", hashtags)
else
Hashtag.where("LOWER(value) NOT iLIKE ANY ( array[?] )", hashtags)
end
Article.joins(:hashtags).merge(hashtags).distinct
end
Upvotes: 1