Yavuz
Yavuz

Reputation: 1393

Match count of a regular expression for every row

I use below query to get content rows which has my_regex_pattern. But I don't know how many times the pattern hit for every row. What is the best way to get match count for every row in Postgres?

For example if a row's content is 'abcdefabcgh' and my regular expression is 'abc', I want 2 since 'abcdefabcgh' has two 'abc'.

SELECT content
FROM table1
WHERE content ~ 'my_regex_pattern'

Or how can I get rows which has matches more than a specific number. For example just give me records which has abc more than 4 times.

Upvotes: 1

Views: 3791

Answers (1)

Erwin Brandstetter
Erwin Brandstetter

Reputation: 657787

Of course you can make it work with regexp_matches(). Or better yet, regexp_split_to_table(). To apply to a whole table, use a LATERAL join (requires Postgres 9.3+):

SELECT content, ct
FROM   table1 t, LATERAL (
   SELECT count(*) - 1 AS ct
   FROM   regexp_split_to_table(t.content, 'abc')
   ) c
WHERE  t.content ~ 'abc';  -- eliminate rows without match

For simple patterns like in the example in your question, you could also:

SELECT content, (length(content) - length(replace(content, 'abc', ''))) / length('abc')
FROM   table1
WHERE  content LIKE '%abc%';

Typically faster, since regular expression functions are costly. Also works for older versions.

Upvotes: 5

Related Questions