Reputation: 303

Extracting values from a string in R using regex

I'm trying to extract the first and second numbers of this string and store them in separate variables.

(User20,10.25)

I can't figure out how to get the user number and then his value.

What I have managed to do so far is this, but I don't know how to remove the rest of the string and get only the number.

gsub("\\(User", "", string)

Upvotes: 0

Answers (4)

Reputation: 887118

Try

str1 <- '(User20,10.25)'
scan(text=gsub('[^0-9.-]+', ' ', str1),quiet=TRUE) 
#[1] 20.00 10.25

In case the string is

str2 <- '(User20-ht,-10.25)'
scan(text=gsub('-(?=[^0-9])|[^0-9.-]+', " ", str2, perl=TRUE), quiet=TRUE)
#[1]  20.00 -10.25

library(stringr) 
str_extract_all(str1, '[0-9.-]+')[[1]]
#[1] "20"    "10.25"

Or using stringi

library(stringi)
stri_extract_all_regex(str1, '[0-9.-]+')[[1]]
#[1] "20"    "10.25"

Upvotes: 6

Reputation: 70732

You can use strsplit with sub ...

> sub('\\(User|\\)', '', strsplit(x, ',')[[1]])
[1] "20"    "10.25"

It would probably be easier to match the context that you want instead.

> regmatches(x, gregexpr('[0-9.]+', x))[[1]]
[1] "20"    "10.25"

Upvotes: 4

Reputation: 1

The following is one approach:

[^,\)\([A-Z]]

Upvotes: 0

Reputation: 193517

Tyler Rinker's "qdapRegex" package has some functions that are useful for this kind of stuff.

In this case, you would most likely be interested in rm_number:

library(qdapRegex)
rm_number(x, extract = TRUE)
# [[1]]
# [1] "20"    "10.25"

Upvotes: 5