Reputation: 41
I have a string called 'raw'. I am trying to parse it in ruby in the following way:
raw = "HbA1C ranging 8.0—10.0%"
raw.scan /\d*\.?\d+[ ]*(-+|\342\200\224)[ ]*\d*\.?\d+/
The output from the above is []
. I think it should be: ["8.0—10.0"]
.
Does anyone have any insight into what is wrong with the above regex statement?
Note: \342\200\224
is equal to —
(em-dash, U+2014).
The piece that is not working is:
(-+|\342\200\224)
I think it should be equivalent to saying, match on 1 or more -
OR match on the string \342\200\224
.
Any help would be greatly appreciated it!
Upvotes: 0
Views: 384
Reputation: 734
The original regex works for me (ruby 1.8.7), justs needs the capture to be non-capturing and scan will output the entire match. Or switch to String#[]
or String#match
instead of String#scan
and don't edit the regex.
raw = "HbA1C ranging 8.0—10.0%"
raw.scan /\d*\.?\d+[ ]*(?:-+|\342\200\224)[ ]*\d*\.?\d+/
# => ["8.0—10.0"]
For testing/building regular expressions in Ruby there's a fantastic tool over at http://rubular.com that makes it a lot easier. http://rubular.com/r/b1318BBimb is the edited regex with a few test cases to make sure it works against them.
Upvotes: 1
Reputation: 83680
raw = "HbA1C ranging 8.0—10.0%"
raw.scan(/\d+\.\d+.+\d+\.\d+/)
#=> ["8.0\342\200\22410.0"]
Upvotes: 0