Bamqf
Bamqf

Reputation: 3542

Regex matching of numbers in R

I'm learning regex matching in R using stringr package, but I don't understand why

str_match("1,000,222.333 /month", "[\\d,]*\\.?\\d*")
     [,1]          
[1,] "1,000,222.333"

returns desired result, while

str_match("about $1,000,222.33 em's", "[\\d,]*\\.?\\d*")
     [,1]
[1,] ""  

returns empty string? Is something wrong with my[\\d,]*?

I learned that number regex matching is complicated, so this snippet is not supposed to used in production, I just want to understand why it fails in this specific case.

Upvotes: 2

Views: 321

Answers (2)

hwnd
hwnd

Reputation: 70722

To elaborate, the problem is the * operator. Since this operator allows the regular expression engine to match zero or more characters, [\d,]* tells the engine to match zero or more digits or the literal character , — which might be none at all. I would write this as follows:

str_match(x, '[\\d,]+(?:\\.\\d+)?')

Or make effective use of rm_number ( a regex I wrote for this ) from the qdapRegex package:

library(qdapRegex)

x <- c("about $1,000,222.33 em's", "1,000,222.333 /month")
rm_number(x, extract=TRUE)

# [[1]]
# [1] "1,000,222.33"

# [[2]]
# [1] "1,000,222.333"

Upvotes: 2

akrun
akrun

Reputation: 886948

You could use + to match one or more characters rather than * which matches 0 or more.

 str_match(v1, "[\\d,]+\\.?\\d*")
 #    [,1]           
 #[1,] "1,000,222.33" 
 #[2,] "1,000,222.333"

data

 v1 <- c("about $1,000,222.33 em's", "1,000,222.333 /month")

Upvotes: 3

Related Questions