vestland
vestland

Reputation: 61154

How do i retrieve all numbers in a string and combine them into one number using regex?

This should be pretty easy, but the results after using suggestions from other SO posts leave me baffled. And, of course, I'd like to avoid using a For loop.

Reproducible example

library(stringr)
input <- "<77Â 500 miles</dd>"
mynumbers <- str_extract_all(input, "[0-9]")

The variable mynumbers is a list of five characters:

> mynumbers
[[1]]
[1] "7" "7" "5" "0" "0"

But this is what I'm after:

> mynumbers
[1] 77500

This post suggests using paste(), and I guess this should work fine given the correct sep and collapse arguments, but I have got to be missing something essential here. I have also tried to use unlist(). Here is what I've tried so far:

1 - using paste()

> paste(mynumbers)
[1] "c(\"7\", \"7\", \"5\", \"0\", \"0\")"

2 - using paste()

> paste(mynumbers, sep = " ")
[1] "c(\"7\", \"7\", \"5\", \"0\", \"0\")"

3 - using paste()

> paste (mynumbers, sep = " ", collapse = NULL)
[1] "c(\"7\", \"7\", \"5\", \"0\", \"0\")"

4 - using paste()

> paste (mynumbers, sep = "", collapse = NULL)
[1] "c(\"7\", \"7\", \"5\", \"0\", \"0\")"

5 - using unlist()

> as.numeric(unlist(mynumbers))
[1] 7 7 5 0 0

I'm hoping some of you have a few suggestions. I guess there's an elegant solution using regex somehow, but I'm also very interested in the paste / unlist problem that is specific to R. Thanks!

Upvotes: 6

Views: 1253

Answers (2)

Matt Rosinski
Matt Rosinski

Reputation: 53

An alternative using the stringr library:

str_remove_all(input, pattern = "\\D+") %>% as.numeric()
[1] 77500

Upvotes: 2

akrun
akrun

Reputation: 887571

The str_extract_all returns a list. We need to convert to vector and then paste. To extract the list element we use [[ and as there is only a single element, mynumbers[[1]] will get the vector. Then, do the paste/collapse and as.numeric.

as.numeric(paste(mynumbers[[1]],collapse=""))
#[1] 77500

We can also match one or more non-numeric (\\D+), replace it with "" in gsub and convert to numeric.

as.numeric(gsub("\\D+", "", input))
#[1] 77500

Upvotes: 10

Related Questions