LoF10
LoF10

Reputation: 2127

R How to Search for String Pattern and Extract Custom character lengths from that location?

I am looking to extract a pattern and then a custom number of characters to the left or right of that pattern. I believe this is possible with Regex but unsure how to proceed. Below is an example of the data and the output I am looking for:

library(data.table)
#my data set
df = data.table(
  event = c(1,2,3),
  notes = c("watch this movie from 4-7pm",
            "watch this musical from 5-9pm",
            "eat breakfast at this place from 7-9am")

)

#how do I point R to a string section and then pull characters around it?

#example:
grepl('pm|am',df$notes) # I can see an index that these keywords exist but how can I tell R
#locate that word and then maybe pull N digits to the left, or n digits to right like substr()

#output would be 
#'4-7pm', '5-9pm', '7-9am'

#right now I can extract the pattern:
library(stringr)
str_extract(df$notes, "pm")
#but I also want to then pull things to the left or right of it.

Upvotes: 0

Views: 184

Answers (2)

Sonny
Sonny

Reputation: 3183

May in your case, just the below should work:

sapply(df$notes, function(x) {
  grep("am|pm", unlist(strsplit(x, " ")), value = T)
}, USE.NAMES = FALSE)
[1] "4-7pm" "5-9pm" "7-9am"

However, this can still fail because of edge cases. You can also try regex to extract all works ending with am or pm

Look at stringr to locate the extract characters and build the radius:

 stringr::str_locate(df$notes, "am|pm")
     start end
[1,]    26  27
[2,]    28  29
[3,]    37  38

Upvotes: 2

Andrew
Andrew

Reputation: 5138

Using stringr you could do something like this. With the matrix of locations you could tinker with moving around the radius for whatever you are looking for:

library(stringr)

# Extacting locations
locations <- str_locate(df$notes, "\\d+\\-\\d+pm|\\d+\\-\\d+am")

# Using substring to pull the info you want
str_sub(df$notes, locations)

[1] "12-7pm" "5-9pm"  "7-9am"

Data (I swapped out 4 for 12):

df = data.table(
  event = c(1,2,3),
  notes = c("watch this movie from 12-7pm",
            "watch this musical from 5-9pm",
            "eat breakfast at this place from 7-9am")

)

Upvotes: 0

Related Questions