Reputation: 401
I'm trying to get the numbers (including decimals) from a string. My data is similar to this:
V <- c("7.20-<7.35","25-<32","60-<83e","40-<50","0.85-<1.15","80-<98","3.0-<3.4","NA","3.0-<3.4 (110)")
Where numbers are mixed with letters and symbols. I need to extract the first number after the < symbol while keeping the index for the missing values. My output would look like:
desired.output <- c(7.35, 32, 83, 50, 1.15, 98, 3.4, NA, 3.4)
I have tried:
resp <- as.numeric(unlist(regmatches(V,
gregexpr("[[:digit:]]+\\.*[[:digit:]]*",V))))
and
resp <- sub(".*<(^[^-])", "\\1", V)
and another patterns in the sub function but nothing seems to work.
What do you suggest as best approach?
Upvotes: 2
Views: 813
Reputation: 163642
You could also specify the first numbers followed by -<
and capture the second part with an optional decimal.
\d+(?:\.\d+)?-<(\d+(?:\.\d+)?).*
The pattern matches:
\d+(?:\.\d+)?
Match 1+ digits with an optional decimal part-<
Match literally(
Capture group 1
\d+(?:\.\d+)?
Match 1+ digits with an optional decimal part)
Close group 1.*
Match the rest of the lineThen you can match the rest of the string that you don't want in the result, and replace with group 1.
V <- c("7.20-<7.35","25-<32","60-<83e","40-<50","0.85-<1.15","80-<98","3.0-<3.4","NA","3.0-<3.4 (110)")
sub("\\d+(?:\\.\\d+)?-<(\\d+(?:\\.\\d+)?).*", "\\1", V)
Output
[1] "7.35" "32" "83" "50" "1.15" "98" "3.4" "NA" "3.4"
Matching all variations of -
<
or >
, you can use a character class listing all the allowed characters and repeat them 1 or more times:
sub("\\d+(?:\\.\\d+)?[<>-]+(\\d+(?:\\.\\d+)?).*", "\\1", V)
Upvotes: 0
Reputation: 931
Using str_extract
from stringr
package in tidyverse:
library(tidyverse)
V <- c("7.20-<7.35","25-<32","60-<83e","40-<50","0.85-<1.15","80-<98","3.0-<3.4","NA","3.0-<3.4 (110)")
str_extract(V, "((?<=\\<)\\d\\.?\\d+|NA)") %>%
as.numeric()
[1] 7.35 32.00 83.00 50.00 1.15 98.00 3.40 NA 3.40
Upvotes: 0
Reputation: 627607
You can use
sub(".*<(\\d+(?:\\.\\d+)?).*", "\\1", V, perl=TRUE)
# => [1] "7.35" "32" "83" "50" "1.15" "98" "3.4" "NA" "3.4"
See the online R demo and the regex demo. Replace \\d+(?:\\.\\d+)?
with \\d*\\.?\\d+
if you need to also get numbers like .05
. Append -?
before the first \\d+
if you need to also get negative numbers.
Details:
.*
- any zero or more chars other than line break chars, as many as possible<
- a <
char(\d+(?:\.\d+)?)
- Group 1 (referred to with \1
from the replacement pattern): one or more digits followed with an optional sequence of a dot and one or more digits.*
- any zero or more chars other than line break chars, as many as possibleUpvotes: 1