woodbine
woodbine

Reputation: 662

PostgreSQL select uniques from three different columns

I have one large table 100m+ rows and two smaller ones 2m rows ea. All three tables have a column of company names that need to be sent out to an API for matching. I want to select the strings from each column and then combine into a single column of unique strings.

I'm using a version of this response, but unsurprisingly the performance is very slow. Combined 2 columns into one column SQL

SELECT DISTINCT 
    unnest(string_to_array(upper(t.buyer) || '#' || upper(a.aw_supplier_name) || '#' || upper(b.supplier_source_string), '#')) 
FROM
    tenders t,
    awards a,
    banking b
;

Any ideas on a more performant way to achieve this?

Update: the banking table is the largest table with 100m rows.

Upvotes: 0

Views: 74

Answers (2)

Ed Mendez
Ed Mendez

Reputation: 1693

Assuming PostgreSQL 9.6 and borrowing the select from rd_nielsen's answer, the following should give you a comma delimited string of the distinct names.

WITH cte
AS (
    SELECT UPPER(T.buyer) NAMES
    FROM tenders T

    UNION

    SELECT UPPER(A.aw_supplier_name) NAMES
    FROM awards A

    UNION

    SELECT UPPER(b.supplier_source_string) NAMES
    FROM banking b
    )
SELECT array_to_string(ARRAY_AGG(cte.names), ',')
FROM cte

Upvotes: 1

rd_nielsen
rd_nielsen

Reputation: 2459

To get just a list of the combined names from all three tables, you could instead union together the selections from each table, like so:

select
    upper(t.buyer)
from
    tenders t
union
select
    upper(a.aw_supplier_name)
from
    awards a
union
select
    upper(b.supplier_source_string)
from
    banking b
;

Upvotes: 1

Related Questions