DJJ
DJJ

Reputation: 2539

R, regexpr,gregexpr, keep track of matches

Let's say I have this data.frame and would like to match the column A with the pattern below. This can be done with regexpr or with gregexpr. Yet I would like to keep track of the rows that were matched as well as the match itself.

df <- data.frame(A=c("where is the pencil? ","the white cat in the kitchen","green hat is over the blue ocean"))

> df
##                                  A
## 1            where is the pencil? 
## 2     the white cat in the kitchen
## 3 green hat is over the blue ocean

pattern <- ("(blue|white|green) \\w*")

regmatches(df[,1],regexpr(pattern,df[,1],perl=TRUE))

> regmatches(df[,1],regexpr(pattern,df[,1],perl=TRUE))
## [1] "white cat" "green hat"

desired output:

##                                  A     match
## 1            where is the pencil?       <NA>
## 2     the white cat in the kitchen white cat
## 3 green hat is over the blue ocean green hat

Upvotes: 0

Views: 122

Answers (1)

G. Grothendieck
G. Grothendieck

Reputation: 269694

Change pattern to:

pattern <- paste0(pattern, "|$")

and then replace empty strings with NA. perl=TRUE is not needed.

Upvotes: 1

Related Questions