Reputation: 839
I have several strings like this:
1:723:NVDA:NSDQ::Z4189274321:Snapshot Report:10:COMPUTER & TECHNOLOGY:241:Semiconductor
I need to return everything after the second colon (:
). So, from the above string I need:
NVDA:NSDQ::Z4189274321:Snapshot Report:10:COMPUTER & TECHNOLOGY:241:Semiconductor
Seeing lots of examples, but none are really working for this task.
Upvotes: 1
Views: 88
Reputation: 3135
The following demonstrates using NULLIF
and REGEXP_REPLACE
with the regular expression ^(([^:]*:){2}(.*))|(.*)
to extract the portion of a string that occurs after a second :
, or NULL
if there are no characters after the second :
:
WITH
test_cases (id, test_value, expected_result) AS (
VALUES
(1, NULL, NULL),
(2, '', NULL),
(3, '1:', NULL),
(4, '1:2:', NULL),
(5, '1:2: ', ' '),
(6,
'1:723:NVDA:NSDQ::Z4189274321:Snapshot Report:10:COMPUTER & TECHNOLOGY:241:Semiconductor',
'NVDA:NSDQ::Z4189274321:Snapshot Report:10:COMPUTER & TECHNOLOGY:241:Semiconductor'
)
),
runs AS (
SELECT
tc.id,
tc.test_value,
tc.expected_result,
NULLIF(
REGEXP_REPLACE(tc.test_value, '^(([^:]*:){2}(.*))|(.*)', '\3'),
''
) AS actual_result
FROM
test_cases tc
)
SELECT
runs.id,
runs.expected_result IS NOT DISTINCT FROM runs.actual_result AS passed,
runs.actual_result
FROM
runs
ORDER BY
runs.id;
The following script measures the overhead associated with generating test strings (test expression s.string||gs.n::TEXT
) and the execution time for each of four expressions that return the portion of the string that occurs after the second :
:
DROP TABLE IF EXISTS test_runs;
CREATE TEMPORARY TABLE test_runs (
id INTEGER GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
test_case INTEGER,
test_expression TEXT,
execution_time_ms NUMERIC
);
DO language plpgsql $BLOCK$
DECLARE
number_of_iterations CONSTANT INTEGER := 100;
number_of_rows CONSTANT INTEGER := 10000;
test_expressions CONSTANT TEXT[] := ARRAY[
$$s.string||gs.n::TEXT$$,
$$SUBSTRING(s.string||gs.n::TEXT, '^.+?:.+?:(.*$)')$$,
$$ARRAY_TO_STRING((STRING_TO_ARRAY(s.string||gs.n::TEXT, ':'))[3:], ':')$$,
$$ARRAY_TO_STRING((REGEXP_SPLIT_TO_ARRAY(s.string||gs.n::TEXT, ':'))[3:], ':')$$,
$$REGEXP_REPLACE(s.string||gs.n::TEXT, '^(([^:]*:){2}(.*))|(.*)', '\3')$$
];
iteration INTEGER;
results JSON;
test_case INTEGER;
test_run INTEGER := 0;
BEGIN
FOR iteration IN 1 .. number_of_iterations LOOP
FOR test_case IN 1 .. CARDINALITY(test_expressions) LOOP
EXECUTE
FORMAT($$
EXPLAIN (ANALYZE, FORMAT JSON)
SELECT %1$s
FROM
(VALUES ('1:723:NVDA:NSDQ::Z4189274321:Snapshot Report:10:COMPUTER & TECHNOLOGY:241:Semiconductor')) s(string)
CROSS JOIN GENERATE_SERIES(1, $2) gs(n);$$,
test_expressions[test_case])
INTO results
USING test_case, number_of_rows;
INSERT INTO test_runs(test_case, test_expression, execution_time_ms)
VALUES (test_case, test_expressions[test_case], (results -> 0 ->> 'Execution Time')::NUMERIC);
END LOOP;
END LOOP;
END;
$BLOCK$;
SELECT AVG(execution_time_ms)::NUMERIC(18,3) AS mean_execution_time_ms, test_expression
FROM test_runs
GROUP BY test_case, test_expression
ORDER BY mean_execution_time_ms, test_case;
mean_execution_time_ms | test_expression |
---|---|
3.486 | s.string||gs.n::TEXT |
24.911 | ARRAY_TO_STRING((STRING_TO_ARRAY(s.string||gs.n::TEXT, ':'))[3:], ':') |
41.494 | SUBSTRING(s.string||gs.n::TEXT, '^.+?:.+?:(.*$)') |
61.743 | REGEXP_REPLACE(s.string||gs.n::TEXT, '^(([^:]:){2}(.))|(.*)', '\3') |
87.302 | ARRAY_TO_STRING((REGEXP_SPLIT_TO_ARRAY(s.string||gs.n::TEXT, ':'))[3:], ':') |
Upvotes: 1
Reputation: 658072
substring()
with a regular expression:
SELECT substring(string, '^.+?:.+?:(.*$)');
The regexp explained:
^
... anchor to start of string (optional)
.
... any character
+?
... 1 or more times, non-greedy
:
... the literal character
(2 times)
(.*$)
... any characters 0 or more times, greedy; in capturing parentheses
test=> SELECT substring('1:723:NVDA:NSDQ::Z4189274321:Snapshot Report:10:COMPUTER & TECHNOLOGY:241:Semiconductor', '^.+?:.+?:(.*$)');
substring
-----------------------------------------------------------------------------------
NVDA:NSDQ::Z4189274321:Snapshot Report:10:COMPUTER & TECHNOLOGY:241:Semiconductor
(1 row)
If performance matters ...
The above is simple and short, so it performs decently. But regular expressions are expensive. Even (costly) converting to an array and back like Adrian demonstrates is typically faster.
This is a couple times faster than anything suggested so far:
SELECT substring(string, length(split_part(string, ':', 1))
+ length(split_part(string, ':', 2))
+ 3);
Upvotes: 1
Reputation: 19684
select array_to_string((string_to_array('1:723:NVDA:NSDQ::Z4189274321:Snapshot Report:10:COMPUTER & TECHNOLOGY:241:Semiconductor', ':'))[3:], ':');
NVDA:NSDQ::Z4189274321:Snapshot Report:10:COMPUTER & TECHNOLOGY:241:Semiconductor
This uses string_to_array
from here:
https://www.postgresql.org/docs/current/functions-string.html
string_to_array ( string text, delimiter text [, null_string text ] ) → text[]
Splits the string at occurrences of delimiter and forms the resulting fields into a text array. If delimiter is NULL, each character in the string will become a separate element in the array. If delimiter is an empty string, then the string is treated as a single field. If null_string is supplied and is not NULL, fields matching that string are replaced by NULL. See also array_to_string.
string_to_array('xx~~yy~~zz', '~~', 'yy') → {xx,NULL,zz}
This breaks the string down into it's component parts and makes an array out of them. The [3:]
selects the third through the end of the array elements from the array.
Then from here:
https://www.postgresql.org/docs/current/functions-array.html
array_to_string ( array anyarray, delimiter text [, null_string text ] ) → text
Converts each array element to its text representation, and concatenates those separated by the delimiter string. If null_string is given and is not NULL, then NULL array entries are represented by that string; otherwise, they are omitted. See also string_to_array.
array_to_string(ARRAY[1, 2, 3, NULL, 5], ',', '') → 1,2,3,,5
This reconstitutes the string by concatenating the array elements with the delimiter.
Upvotes: 1
Reputation: 127367
This one returns an array, by splitting the content on the :
SELECT (regexp_split_to_array('1:723:NVDA:NSDQ::Z4189274321:Snapshot Report:10:COMPUTER & TECHNOLOGY:241:Semiconductor',':'))[3:];
If you need a single string:
SELECT array_to_string((regexp_split_to_array('1:723:NVDA:NSDQ::Z4189274321:Snapshot Report:10:COMPUTER & TECHNOLOGY:241:Semiconductor',':'))[3:], ':');
This one uses a regex to split the content and also returns an array, but just one element and that's easy to convert to a single string:
SELECT (regexp_matches('1:723:NVDA:NSDQ::Z4189274321:Snapshot Report:10:COMPUTER & TECHNOLOGY:241:Semiconductor', '^[^:]*:[^:]*:(.*)', 'g'))[1];
Upvotes: 0