Rodrigo
Rodrigo

Reputation: 69

Stata Regular expressions extracting numerical values

I have some data that looks like this

var1
h 01 .00 .0 abc
d 1.0 .0 14.0abc
1,0.0 0.0 .0abc

It should be noted that the last three alpha values are the same, and I am hoping to extract all the numerical values within the string. The code that I'm using look like this

gen x1=regexs(1) if regexm(var1,"([0-9]+) [ ]*(abc)*$")

However, this code only extracts the numbers before the abc term and stops after a space or a .. For example, only 0 before abc is extracted from the first term. I was wondering whether there is a way to handle this and extract all the numerical values before the alpha characters.

Upvotes: 0

Views: 1784

Answers (1)

Nick Cox
Nick Cox

Reputation: 37368

As @Roberto Ferrer points out, your question isn't very clear, but here is an example using moss from SSC:

. clear 

. input str16 var1

                var1
1. "h 01 .00 .0 abc"
2. "d 1.0 .0 14.0abc"
3. "1,0.0 0.0 .0abc"
4. end 

. moss var1, regex match("([0-9]+\.*[0-9]*|\.[0-9]+)") 

. l _match*

   +---------------------------------------+
   | _match1   _match2   _match3   _match4 |
   |---------------------------------------|
1. |      01       .00        .0           |
2. |     1.0        .0      14.0           |
3. |       1       0.0       0.0        .0 |
   +---------------------------------------+

Upvotes: 1

Related Questions