Reputation: 908
I was looking at this question and the accepted answer gives this as a solution for the case when there are fewer than n characters in the string:
^(([^>]*>){4}|.*)
However, I have done a fiddle here, and it shows that this regex will just simply return the entire string all of the time.
This code:
SELECT
SUBSTRING(a FROM '^(([^>]*>){4}|.*)'),
a,
LENGTH(SUBSTRING(a FROM '^(([^>]*>){4}|.*)')),
LENGTH(a),
LENGTH(SUBSTRING(a FROM '^(([^>]*>){4}|.*)')) = LENGTH(a)
FROM s
WHERE LENGTH(SUBSTRING(a FROM '^(([^>]*>){4}|.*)')) = LENGTH(a) IS false;
after several runs returns no records - meaning that the regex is doing nothing.
I would like a regex which returns up to the fourth >
character (not including it) OR the entire string if the string only contains 3 or fewer >
characters. RTRIM()
can always be used to trim the final >
if not including it is too tricky - having an answer which gives both possibilities would help me to deepen my understanding of regexes!
This is not a duplicate - it's certainly related, but I'd like to correct the error in the original answer - and provide a correct answer of my own.
Upvotes: 1
Views: 315
Reputation: 163207
You can repeat matching 0-3 times including the >
using
^(?:[^>]*>){0,3}[^>]*
^
Start of string(?:[^>]*>){0,3}
Repeat 0 - 3 times matching any character except >
and then match >
[^>]*
Optionally match any char except >
See a regex demo.
If there should be at least a single >
then the quantifier can be {1,3}
Upvotes: 1
Reputation: 626728
You can use
REGEXP_REPLACE(a, '^((?:[^>]*>){4}).*', '\1')
See the regex demo. Details:
^
- start of string((?:[^>]*>){4})
- Group 1 (\1
): four sequences of any chars other than >
and then a >
char.*
- the rest of the line.Here is a test:
CREATE TABLE s
(
a TEXT
);
INSERT INTO s VALUES
('afsad>adfsaf>asfasf>afasdX>asdffs>asfdf>'),
('23433>433453>4>4559>455>3433>'),
('adfd>adafs>afadsf>');
SELECT REGEXP_REPLACE(a, '^((?:[^>]*>){4}).*', '\1') as Output FROM s;
Output:
Upvotes: 1