Tammy
Tammy

Reputation: 173

Pulling out numbers before a phrase

I'm struggling to use regex so any insight would be helpful. I have a list like this:

[1] "collected 1 hr total. wind >15 mph."   "collected 4 hr total. 
wind ~15 mph."  
[3] "collected 10 hr total. gusts 5-10 mph." "collected 1 hr total. 
breeze at 1mph," 
[5] "collected 2 hrs."    [6]

I want:

 [1] > 15 mph
 [2] ~15 mph
 [3] 5-10 mph
 [4] 1mph
 [5] 
 [6]

And I want to pull out wind speed in each row. Can you suggest the correct regex expression? As you can see, a) there can be a variable number of spaces between the digits & "mph" b) the digits before mph can have different symbols, ">","<", "~" or can be an interval "-"

Thank you in advance!

Upvotes: 2

Views: 38

Answers (2)

akrun
akrun

Reputation: 887391

One option with str_extract

library(stringr)
trimws(str_extract(v1, "[>~]?[0-9- ]+mph"))
#[1] ">15 mph"   "~15 mph"   "5-10 mph" "1mph"     NA     

data

v1 <- c("collected 1 hr total. wind >15 mph.", 
   "collected 4 hr total. wind ~15 mph.", 
 "collected 10 hr total. gusts 5-10 mph.", 
 "collected 1 hr total. breeze at 1mph,", 
  "collected 2 hrs.")

Upvotes: 1

Tim Biegeleisen
Tim Biegeleisen

Reputation: 521794

Assuming that each string only has one matching term at most, then we can try using sapply along with sub:

input <- c("collected 1 hr total. wind >15 mph.",
           "collected 4 hr total. wind ~15 mph.",
           "collected 10 hr total. gusts 5-10 mph.",
           "collected 1 hr total. breeze at 1mph,",
           "collected 2 hrs.")

matches <- sapply(input, function(x) {
    ifelse(grepl("[>~0-9-]+\\s*mph", x),
           sub(".*?([>~0-9-]+\\s*mph).*", "\\1", x),
           "")})

names(matches) <- c(1:length(matches))
matches

         1          2          3          4          5 
 ">15 mph"  "~15 mph" "5-10 mph"     "1mph"         "" 

Upvotes: 1

Related Questions