Cyrus Mohammadian
Cyrus Mohammadian

Reputation: 5193

Remove everything except period and numbers from string regex in R

I know there are many questions on stack overflow regarding regex but I cannot accomplish this one easy task with the available help I've seen. Here's my data:

a<-c("Los Angeles, CA","New York, NY", "San Jose, CA")
b<-c("c(34.0522, 118.2437)","c(40.7128, 74.0059)","c(37.3382, 121.8863)")

df<-data.frame(a,b)
df
                a                    b
1 Los Angeles, CA c(34.0522, 118.2437)
2    New York, NY  c(40.7128, 74.0059)
3    San Jose, CA c(37.3382, 121.8863)

I would like to remove the everything but the numbers and the period (i.e. remove "c", ")" and "(". This is what I've tried thus far:

str_replace(df$b,"[^0-9.]","" )
[1] "(34.0522, 118.2437)" "(40.7128, 74.0059)"  "(37.3382, 121.8863)"

str_replace(df$b,"[^\\d\\)]+","" )
[1] "34.0522, 118.2437)" "40.7128, 74.0059)"  "37.3382, 121.8863)"

Not sure what's left to try. I would like to end up with the following:

 [1] "34.0522, 118.2437" "40.7128, 74.0059"  "37.3382, 121.8863"

Thanks.

Upvotes: 5

Views: 9434

Answers (4)

akrun
akrun

Reputation: 887951

Here is another option with str_extract_all from stringr. Extract the numeric part using str_extract_all into a list, convert to numeric, rbind the list elements and cbind it with the first column of 'df'

library(stringr)
cbind(df[1], do.call(rbind, 
      lapply(str_extract_all(df$b, "[0-9.]+"), as.numeric)))

Upvotes: 1

hvollmeier
hvollmeier

Reputation: 2986

If I understand you correctly, this is what you want:

df$b <- gsub("[^[:digit:]., ]", "", df$b)

or:

df$b <- strsplit(gsub("[^[:digit:]. ]", "", df$b), " +")
> df
                a                 b
1 Los Angeles, CA 34.0522, 118.2437
2    New York, NY  40.7128, 74.0059
3    San Jose, CA 37.3382, 121.8863

or if you want all the "numbers" as a numeric vector:

as.numeric(unlist(strsplit(gsub("[^[:digit:]. ]", "", df$b), " +")))
[1]  34.0522 118.2437  40.7128  74.0059  37.3382 121.8863

Upvotes: 11

user2100721
user2100721

Reputation: 3597

Try this

gsub("[\\c|\\(|\\)]", "",df$b)
#[1] "34.0522, 118.2437" "40.7128, 74.0059"  "37.3382, 121.8863"

Upvotes: 3

Richie Cotton
Richie Cotton

Reputation: 121177

Not a regular expression solution, but a simple one.

The elements of b are R expressions, so loop over each element, parsing it, then creating the string you want.

vapply(
  b, 
  function(bi) 
  {
    toString(eval(parse(text = bi)))
  }, 
  character(1)
)

Upvotes: 2

Related Questions