An economist
An economist

Reputation: 1311

Match a pattern multiple times using regex

I am trying to match multiple occurrences of the same pattern within string. Unfortunately, using ustrregexs and ustrregexm returns only the first match. Additionally, I don't know how many matches there could be, hence hard coding n matches is not an option. Is there a way to find all matches in Stata?

Example:

clear all

input x str250 y
1 "123 12"
2 "345 678"
3 "000 000 000"
4 "111"
5 "00"
6 "000 000 000000 000 000000 000 000000 000 000000 000 000000 000 000000 000 000000 000 000000 000 000000 000 000000 000 000000 000 000000 000 000000 000 000000 000 000000 000 000000 000 000000 000 000000 000 000000 000 000"
end

* Returns only the first match
gen match = ustrregexs(0) if ustrregexm(y, "(\d{3})+")

Upvotes: 0

Views: 355

Answers (1)

Nick Cox
Nick Cox

Reputation: 37368

moss from SSC is dedicated to precisely this question. If "natively" excludes community-contributed commands, then you need to write your own code.

clear all

input x str20 y
1 "123 12"
2 "345 678"
3 "000 000 000"
4 "111"
5 "00"
end

moss y, match("([0-9][0-9][0-9])") regex 

list 

     +--------------------------------------------------------------------------------+
     | x             y   _count   _match1   _pos1   _match2   _pos2   _match3   _pos3 |
     |--------------------------------------------------------------------------------|
  1. | 1        123 12        1       123       1                 .                 . |
  2. | 2       345 678        2       345       1       678       5                 . |
  3. | 3   000 000 000        3       000       1       000       5       000       9 |
  4. | 4           111        1       111       1                 .                 . |
  5. | 5            00        0                 .                 .                 . |
     +--------------------------------------------------------------------------------+

Upvotes: 1

Related Questions