Reputation: 1311
I am trying to match multiple occurrences of the same pattern within string. Unfortunately, using ustrregexs
and ustrregexm
returns only the first match. Additionally, I don't know how many matches there could be, hence hard coding n
matches is not an option. Is there a way to find all matches in Stata?
Example:
clear all
input x str250 y
1 "123 12"
2 "345 678"
3 "000 000 000"
4 "111"
5 "00"
6 "000 000 000000 000 000000 000 000000 000 000000 000 000000 000 000000 000 000000 000 000000 000 000000 000 000000 000 000000 000 000000 000 000000 000 000000 000 000000 000 000000 000 000000 000 000000 000 000000 000 000"
end
* Returns only the first match
gen match = ustrregexs(0) if ustrregexm(y, "(\d{3})+")
Upvotes: 0
Views: 355
Reputation: 37368
moss
from SSC is dedicated to precisely this question. If "natively" excludes community-contributed commands, then you need to write your own code.
clear all
input x str20 y
1 "123 12"
2 "345 678"
3 "000 000 000"
4 "111"
5 "00"
end
moss y, match("([0-9][0-9][0-9])") regex
list
+--------------------------------------------------------------------------------+
| x y _count _match1 _pos1 _match2 _pos2 _match3 _pos3 |
|--------------------------------------------------------------------------------|
1. | 1 123 12 1 123 1 . . |
2. | 2 345 678 2 345 1 678 5 . |
3. | 3 000 000 000 3 000 1 000 5 000 9 |
4. | 4 111 1 111 1 . . |
5. | 5 00 0 . . . |
+--------------------------------------------------------------------------------+
Upvotes: 1