Reputation: 1393
I use below query to get content rows which has my_regex_pattern. But I don't know how many times the pattern hit for every row. What is the best way to get match count for every row in Postgres?
For example if a row's content is 'abcdefabcgh' and my regular expression is 'abc', I want 2 since 'abcdefabcgh' has two 'abc'.
SELECT content
FROM table1
WHERE content ~ 'my_regex_pattern'
Or how can I get rows which has matches more than a specific number. For example just give me records which has abc more than 4 times.
Upvotes: 1
Views: 3791
Reputation: 657787
Of course you can make it work with regexp_matches()
. Or better yet, regexp_split_to_table()
. To apply to a whole table, use a LATERAL
join (requires Postgres 9.3+):
SELECT content, ct
FROM table1 t, LATERAL (
SELECT count(*) - 1 AS ct
FROM regexp_split_to_table(t.content, 'abc')
) c
WHERE t.content ~ 'abc'; -- eliminate rows without match
For simple patterns like in the example in your question, you could also:
SELECT content, (length(content) - length(replace(content, 'abc', ''))) / length('abc')
FROM table1
WHERE content LIKE '%abc%';
Typically faster, since regular expression functions are costly. Also works for older versions.
Upvotes: 5