jcf
jcf

Reputation: 602

PostgreSQL search lists of substrings in string column

I have the following table in a postreSQL database (simplified for clarity):

  | serverdate |           name | value 
  |-------------------------------------
0 | 2019-12-01 | A LOC 123 DISP | 1
1 | 2019-12-01 | B LOC 456 DISP | 2
2 | 2019-12-01 | C LOC 777 DISP | 0
3 | 2019-12-01 | D LOC 000 DISP | 10
4 | 2019-12-01 | A LOC 700 DISP | 123
5 | 2019-12-01 | F LOC 777 DISP | 8

name columns is of type string. The substrings LOC and DISP can have other values of different lengths but are not of interest in this question.

The problem: I want to SELECT the rows that only contain a certain substring. There are several substrings, passed as an ARRAY, in the following format:

['A_123', 'F_777'] # this is an example only

I would want to select all the rows that contain the first part of the substring (sepparating it by the underscore '_'), as well as the second. In this example, with the mentioned array, I should obtain rows 0 and 5 (as these are the only ones with exact matches in both parts of the):

  | serverdate |           name | value 
  |-------------------------------------
0 | 2019-12-01 | A LOC 123 DISP | 1
5 | 2019-12-01 | F LOC 777 DISP | 8

Row 4 has the first part of the substring correct, but not the other one, so it shouldn't be returned. Same thing with row 2 (only second part matches).

How could this query be done? I'm relatively new to SQL.

This query is part of process in Python, so I can adjust the input parameter (the substring array) if needed, but the behaviour must be the same as the one described.

Thanks!

Upvotes: 3

Views: 2484

Answers (4)

jcf
jcf

Reputation: 602

Thanks for your answers! Solution by Larry B got me an error, but it was caused by external factors (I run the queries using an internal tool developed by my company and it threw errors when using the % wildcard. Strange behaviour, I already contacted support team), so I could not test it properly.

Solution by Jim Jones seemed an alternative, but I found that, in some cases, the values in the name field would look like these (didn't notice it when writing the question, as it a rare case):

ABC LOC 123 DISP

So I modified the solution a little bit so as to grab the first part of the name when splitting it by the ' ' character.

(TLDR: 1st substring of name could be of arbitrary length, but is always at the start)

My solution is this one:

SELECT * FROM 
  (SELECT serverdate, split_part(name, ' ', 1)||'_'||
          regexp_replace(name, '\D*', '', 'g') AS name, value 
  FROM t) j 
WHERE name IN('A_123', 'F_777');

Upvotes: 1

Jim Jones
Jim Jones

Reputation: 19693

Have you tried with regexp_replace and a subquery?

SELECT * FROM 
  (SELECT serverdate, substring(name from 1 for 1)||'_'||
          regexp_replace(name, '\D*', '', 'g') AS name, value 
  FROM t) j 
WHERE name IN('A_123', 'F_777');

Or using a CTE

WITH j AS (
SELECT serverdate, substring(name from 1 for 1)||'_'||
       regexp_replace(name, '\D*', '', 'g') AS name2, 
       value,name
FROM t 
) SELECT serverdate,name,value FROM j 
  WHERE name2 IN('A_123', 'F_777');


 serverdate |      name      | value 
------------+----------------+-------
 2019-12-01 | A LOC 123 DISP |     1
 2019-12-01 | F LOC 777 DISP |     8
(2 Zeilen)

Upvotes: 2

JK1993
JK1993

Reputation: 138

split_part(name,'_',1) + '_' + split_part(name,'_',3) as name

this is the break down of the query: A + _ + 123 = A_123

Upvotes: 0

Larry Beasley
Larry Beasley

Reputation: 334

Just unnest the array and join the table using a like clause

select
    *
from
    Table1
join 
(
    select 
        '%'||replace(unnest, '_', '%')||'%' pat 
    from 
        unnest(array['A_123', 'F_777'])
) pat_table on "name" like "pat"

Just replace unnest(array['A_123', 'F_777']) with unnest(string_to_array(str_variable, ','))

Upvotes: 1

Related Questions