nipy
nipy

Reputation: 5498

Query which articles contain a tag and return a count of the other most common tags

Overview

I'm trying to query my PostgresSQL database as follows:

For a tag (i.e. "Laptop"), query which articles contain this tag and return a count of the other most common tags that are found in these other articles i.e.

Keyboard" (22)
Software (10)

These are my tables:

table tag (
  id SERIAL PRIMARY KEY,
  label TEXT UNIQUE
);

table article_tag (
article_id INT REFERENCES article(id) ON DELETE CASCADE ON UPDATE CASCADE,
tag_id INT REFERENCES tag(id) ON DELETE CASCADE ON UPDATE CASCADE ,
PRIMARY KEY (article_id, tag_id)
);

  CREATE table article (
 id SERIAL PRIMARY KEY,
title TEXT);

What have I tried:

I found another similar question and tried to modify the query but I clearly don't understand it as I get errors:

select 
    p1.article_tags, p2.article_tags, count(*) times_order_together 
from
    article p1
    inner join
    article p2
    on
        p1.id = p2.id
        and
        p1.id != p2.id
group by
    p1.id, p2.id
order by
    count(*) desc 

"postgres-error : column p1.article_tags does not exist"

I also asked another similar question here where I provided Python code to achieve a similar output.

Upvotes: 0

Views: 75

Answers (1)

ilowerp
ilowerp

Reputation: 58

The core of the solution is to self-join the article_tag table. I believe this is what you are looking for:

select t2.label, count(*) count
from tag t
left join article_tag at1 on (t.id = at1.tag_id)
left join article_tag at2 on (at1.article_id = at2.article_id and at2.tag_id != t.id)
left join tag t2 on (at2.tag_id = t2.id)
where t.label = 'Laptop'
group by t2.label
order by count(*) desc;

You start by filtering the tag table for your label:

from tag t
where t.label = 'Laptop'

Then you join the related articles:

left join article_tag at1 on (t.id = at1.tag_id)

And for those articles you join again their tags while omitting the original one:

left join article_tag at2 on (at1.article_id = at2.article_id and at2.tag_id != t.id)

Next you join the labels for those tags:

left join tag t2 on (at2.tag_id = t2.id)

Finally you group by label, count the rows and return them in descending order:

select t2.label, count(*) count
...
group by t2.label
order by count(*) desc;

Upvotes: 1

Related Questions