Kiwi
Kiwi

Reputation: 161

Need to extract whole sentences which begins with a specific word in R

I need to extract whole sentences which begins with a specific word in R. Below is the code which i am trying to use but not able to get the desired result. I am new to regular expression concept in R. I want to extract the sentences which begins with word 'database'.

 sent <- c("database connection","connection database fail", "fail connection database","database connection is good")
 m <- gregexpr('database.*', sent)
 regmatches(sent, m)

Above code gives me the remaining words after word 'database'. But my desired output is:

 "database connection", "database connection is good"

Thanks for your help!

Upvotes: 0

Views: 1163

Answers (2)

cderv
cderv

Reputation: 6542

With stringr

sent <- c("database connection","connection database fail", "fail connection database","database connection is good")
stringr::str_subset(sent, "^database.*")
#> [1] "database connection"         "database connection is good"

With base R :

sent <- c("database connection","connection database fail", "fail connection database","database connection is good")
grep("^database.*", sent, value = T)
#> [1] "database connection"         "database connection is good"

Upvotes: 3

Eli Sadoff
Eli Sadoff

Reputation: 7308

You're not anchoring the regex to the front of the line. If you use the front anchor (^), you'll get the desired result. Here is what your code should look like:

sent <- c("database connection","connection database fail", "fail connection database","database connection is good")
m <- gregexpr('^database.*', sent)
regmatches(sent, m)

If you want to remove the character(0) elements from the result you can have the last line be

r <- regmatches(sent, m)
r <- r[lapply(r,length)>0]

Upvotes: 1

Related Questions