Reputation: 5498
Overview
I'm trying to query my PostgresSQL database as follows:
For a tag (i.e. "Laptop"), query which articles contain this tag and return a count of the other most common tags that are found in these other articles i.e.
Keyboard" (22)
Software (10)
These are my tables:
table tag (
id SERIAL PRIMARY KEY,
label TEXT UNIQUE
);
table article_tag (
article_id INT REFERENCES article(id) ON DELETE CASCADE ON UPDATE CASCADE,
tag_id INT REFERENCES tag(id) ON DELETE CASCADE ON UPDATE CASCADE ,
PRIMARY KEY (article_id, tag_id)
);
CREATE table article (
id SERIAL PRIMARY KEY,
title TEXT);
What have I tried:
I found another similar question and tried to modify the query but I clearly don't understand it as I get errors:
select
p1.article_tags, p2.article_tags, count(*) times_order_together
from
article p1
inner join
article p2
on
p1.id = p2.id
and
p1.id != p2.id
group by
p1.id, p2.id
order by
count(*) desc
"postgres-error : column p1.article_tags does not exist"
I also asked another similar question here where I provided Python code to achieve a similar output.
Upvotes: 0
Views: 75
Reputation: 58
The core of the solution is to self-join the article_tag table. I believe this is what you are looking for:
select t2.label, count(*) count
from tag t
left join article_tag at1 on (t.id = at1.tag_id)
left join article_tag at2 on (at1.article_id = at2.article_id and at2.tag_id != t.id)
left join tag t2 on (at2.tag_id = t2.id)
where t.label = 'Laptop'
group by t2.label
order by count(*) desc;
You start by filtering the tag table for your label:
from tag t
where t.label = 'Laptop'
Then you join the related articles:
left join article_tag at1 on (t.id = at1.tag_id)
And for those articles you join again their tags while omitting the original one:
left join article_tag at2 on (at1.article_id = at2.article_id and at2.tag_id != t.id)
Next you join the labels for those tags:
left join tag t2 on (at2.tag_id = t2.id)
Finally you group by label, count the rows and return them in descending order:
select t2.label, count(*) count
...
group by t2.label
order by count(*) desc;
Upvotes: 1