Reputation: 69
I have a data.table with a "message" column.
I need to extract those messages with following pattern in it
"THISIsNotImportant: THIS_IS_IMPORTANT Rest of the Message"
how do i extract the Messages in this pattern and store the segment in bolds into a vector?
Upvotes: 0
Views: 247
Reputation: 389325
Using sub
in base R :
x <- "THISIsNotImportant: THIS_IS_IMPORTANT Rest of the Message"
sub('.*:\\s([A-Z_]+).*', '\\1', x)
#[1] "THIS_IS_IMPORTANT"
To add this as a new column in data.table
for all the rows, you can do :
library(data.table)
dt[, imp_message := sub('.*:\\s([A-Z_]+).*', '\\1', message)]
Upvotes: 0
Reputation: 11548
Does this work:
library(dplyr)
library(stringr)
df %>% mutate(c2 = str_extract(c1, '(?<=:\\s)[A-Z_]+\\b'))
c1 c2
1 THISIsNotImportant: THIS_IS_IMPORTANT Rest of the Message THIS_IS_IMPORTANT
2 THISIsNotImportant: THIS_IS_UNIMPORTANT Rest of the Message THIS_IS_UNIMPORTANT
Data used:
df
c1
1 THISIsNotImportant: THIS_IS_IMPORTANT Rest of the Message
2 THISIsNotImportant: THIS_IS_UNIMPORTANT Rest of the Message
Upvotes: 2