Match a pattern multiple times using regex

Question

I am trying to match multiple occurrences of the same pattern within string. Unfortunately, using ustrregexs and ustrregexm returns only the first match. Additionally, I don't know how many matches there could be, hence hard coding n matches is not an option. Is there a way to find all matches in Stata?

Example:

clear all

input x str250 y
1 "123 12"
2 "345 678"
3 "000 000 000"
4 "111"
5 "00"
6 "000 000 000000 000 000000 000 000000 000 000000 000 000000 000 000000 000 000000 000 000000 000 000000 000 000000 000 000000 000 000000 000 000000 000 000000 000 000000 000 000000 000 000000 000 000000 000 000000 000 000"
end

* Returns only the first match
gen match = ustrregexs(0) if ustrregexm(y, "(\d{3})+")

Nick Cox · Accepted Answer

moss from SSC is dedicated to precisely this question. If "natively" excludes community-contributed commands, then you need to write your own code.

clear all

input x str20 y
1 "123 12"
2 "345 678"
3 "000 000 000"
4 "111"
5 "00"
end

moss y, match("([0-9][0-9][0-9])") regex 

list 

     +--------------------------------------------------------------------------------+
     | x             y   _count   _match1   _pos1   _match2   _pos2   _match3   _pos3 |
     |--------------------------------------------------------------------------------|
  1. | 1        123 12        1       123       1                 .                 . |
  2. | 2       345 678        2       345       1       678       5                 . |
  3. | 3   000 000 000        3       000       1       000       5       000       9 |
  4. | 4           111        1       111       1                 .                 . |
  5. | 5            00        0                 .                 .                 . |
     +--------------------------------------------------------------------------------+

Match a pattern multiple times using regex

Answers (1)

Related Questions