user6550364
user6550364

Reputation:

Regex to remove all non-digit symbols from string in R

How can I extract digits from a string that can have a structure of xxxx.x or xxxx.x-x and combine them as a number? e.g.

list <- c("1010.1-1", "1010.2-1", "1010.3-1", "1030-1", "1040-1",
          "1060.1-1", "1060.2-1", "1070-1", "1100.1-1", "1100.2-1")

The desired (numeric) output would be:

101011, 101021, 101031...

I tried

regexp <- "([[:digit:]]+)"
solution <- str_extract(list, regexp)

However that only extracts the first set of digits; and using something like

regexp <- "([[:digit:]]+\\.[[:digit:]]+\\-[[:digit:]]+)"

returns the first result (data in its initial form) if matched otherwise NA for shorter strings. Thoughts?

Upvotes: 2

Views: 4736

Answers (2)

Emiel Koning
Emiel Koning

Reputation: 4225

I have no experience with R but I do know regular expressions. When I look at the pattern you're specifying "([[:digit:]]+)". I assume [[:digit:]] stands for [0-9], so you're capturing one group of digits.

It seems to me you're missing a + to make it capture multiple groups of digits.

I'd think you'd need to use "([[:digit:]]+)+".

Upvotes: 0

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626738

Remove all non-digit symbols:

list <- c("1010.1-1", "1010.2-1", "1010.3-1", "1030-1", "1040-1", "1060.1-1", "1060.2-1", "1070-1", "1100.1-1", "1100.2-1")
as.numeric(gsub("\\D+", "", list))
## => [1] 101011 101021 101031  10301  10401 106011 106021  10701 110011 110021

See the R demo online

Upvotes: 7

Related Questions