Reputation: 616
I have a table education
that has a column university
. For each of the rows in the table I want to find 3 most similar universities from the table.
Here is my query that finds 3 most similar universities to a given input:
select distinct(university),
similarity(unaccent(lower(university)),
unaccent(lower('Boston university')))
from education
order by similarity(unaccent(lower(university)),
unaccent(lower('Boston university'))) desc
limit 3;
It works fine. But now I would like to modify this query so that I get two columns and a row for each existing university in the table: the first column would be the university name and the second would be the three most similar universities found in the database (or if its easier - four columns where the first is the university and the next 3 are the most similar ones).
What should this statement look like?
Upvotes: 0
Views: 258
Reputation: 222402
You could use an inline aggregated query:
with u as (select distinct university from education)
select
university,
(
select string_agg(u.university, ',')
from u
where u.university != e.university
order by similarity(
unaccent(lower(u.university)),
unaccent(lower(e.university))
) desc
limit 3
) similar_universities
from education e
This assumes that a given university may occur more than once in the education table (hence the need for a cte).
Upvotes: 1
Reputation: 1269443
I think a lateral join and aggregation does what you want:
select e.university, e2.universities
from education e cross join lateral
(select array_agg(university order by sim desc) as universities
from (select e2.university,
similarity(unaccent(lower(e2.university)),
unaccent(lower(e.university))
) as sim
from education e2
order by sim desc
limit 3
) e2
) e2;
Note: The most similar university is probably the one with the same name. (You can filter that out with a where
clause in the subquery.)
This returns the value as an array, because I prefer working with arrays rather than strings in Postgres.
Upvotes: 0