Reputation: 662
I have one large table 100m+ rows and two smaller ones 2m rows ea. All three tables have a column of company names that need to be sent out to an API for matching. I want to select the strings from each column and then combine into a single column of unique strings.
I'm using a version of this response, but unsurprisingly the performance is very slow. Combined 2 columns into one column SQL
SELECT DISTINCT
unnest(string_to_array(upper(t.buyer) || '#' || upper(a.aw_supplier_name) || '#' || upper(b.supplier_source_string), '#'))
FROM
tenders t,
awards a,
banking b
;
Any ideas on a more performant way to achieve this?
Update: the banking table is the largest table with 100m rows.
Upvotes: 0
Views: 74
Reputation: 1693
Assuming PostgreSQL 9.6 and borrowing the select from rd_nielsen's answer, the following should give you a comma delimited string of the distinct names.
WITH cte
AS (
SELECT UPPER(T.buyer) NAMES
FROM tenders T
UNION
SELECT UPPER(A.aw_supplier_name) NAMES
FROM awards A
UNION
SELECT UPPER(b.supplier_source_string) NAMES
FROM banking b
)
SELECT array_to_string(ARRAY_AGG(cte.names), ',')
FROM cte
Upvotes: 1
Reputation: 2459
To get just a list of the combined names from all three tables, you could instead union
together the selections from each table, like so:
select
upper(t.buyer)
from
tenders t
union
select
upper(a.aw_supplier_name)
from
awards a
union
select
upper(b.supplier_source_string)
from
banking b
;
Upvotes: 1