Ram
Ram

Reputation: 69

R search subset string from data.table column for Capitalized words

I have a data.table with a "message" column.
I need to extract those messages with following pattern in it

"THISIsNotImportant: THIS_IS_IMPORTANT Rest of the Message"

how do i extract the Messages in this pattern and store the segment in bolds into a vector?

Upvotes: 0

Views: 247

Answers (3)

Ronak Shah
Ronak Shah

Reputation: 389325

Using sub in base R :

x <- "THISIsNotImportant: THIS_IS_IMPORTANT Rest of the Message"
sub('.*:\\s([A-Z_]+).*', '\\1', x)
#[1] "THIS_IS_IMPORTANT"

To add this as a new column in data.table for all the rows, you can do :

library(data.table)
dt[, imp_message := sub('.*:\\s([A-Z_]+).*', '\\1', message)]

Upvotes: 0

Pal R.K.
Pal R.K.

Reputation: 118

str_extract(s, '\\b[A-Z_]+\\b')

Upvotes: 1

Karthik S
Karthik S

Reputation: 11548

Does this work:

library(dplyr)
library(stringr)
df %>% mutate(c2 = str_extract(c1, '(?<=:\\s)[A-Z_]+\\b'))
                                                           c1                  c2
1   THISIsNotImportant: THIS_IS_IMPORTANT Rest of the Message   THIS_IS_IMPORTANT
2 THISIsNotImportant: THIS_IS_UNIMPORTANT Rest of the Message THIS_IS_UNIMPORTANT

Data used:

df
                                                           c1
1   THISIsNotImportant: THIS_IS_IMPORTANT Rest of the Message
2 THISIsNotImportant: THIS_IS_UNIMPORTANT Rest of the Message

Upvotes: 2

Related Questions