Reputation: 2127
I am looking to extract a pattern and then a custom number of characters to the left or right of that pattern. I believe this is possible with Regex but unsure how to proceed. Below is an example of the data and the output I am looking for:
library(data.table)
#my data set
df = data.table(
event = c(1,2,3),
notes = c("watch this movie from 4-7pm",
"watch this musical from 5-9pm",
"eat breakfast at this place from 7-9am")
)
#how do I point R to a string section and then pull characters around it?
#example:
grepl('pm|am',df$notes) # I can see an index that these keywords exist but how can I tell R
#locate that word and then maybe pull N digits to the left, or n digits to right like substr()
#output would be
#'4-7pm', '5-9pm', '7-9am'
#right now I can extract the pattern:
library(stringr)
str_extract(df$notes, "pm")
#but I also want to then pull things to the left or right of it.
Upvotes: 0
Views: 184
Reputation: 3183
May in your case, just the below should work:
sapply(df$notes, function(x) {
grep("am|pm", unlist(strsplit(x, " ")), value = T)
}, USE.NAMES = FALSE)
[1] "4-7pm" "5-9pm" "7-9am"
However, this can still fail because of edge cases. You can also try regex to extract all works ending with am or pm
Look at stringr
to locate the extract characters and build the radius:
stringr::str_locate(df$notes, "am|pm")
start end
[1,] 26 27
[2,] 28 29
[3,] 37 38
Upvotes: 2
Reputation: 5138
Using stringr
you could do something like this. With the matrix of locations you could tinker with moving around the radius for whatever you are looking for:
library(stringr)
# Extacting locations
locations <- str_locate(df$notes, "\\d+\\-\\d+pm|\\d+\\-\\d+am")
# Using substring to pull the info you want
str_sub(df$notes, locations)
[1] "12-7pm" "5-9pm" "7-9am"
Data (I swapped out 4 for 12):
df = data.table(
event = c(1,2,3),
notes = c("watch this movie from 12-7pm",
"watch this musical from 5-9pm",
"eat breakfast at this place from 7-9am")
)
Upvotes: 0