lookbehind in str_extract with R

Question

I have the following text file

[01/29/14 16:42:55, 10.100.120.120, unknown]: spatial_monitor: Alan entered Conference Room (Zone Role contains Person role)
[01/29/14 16:42:57, 10.100.120.120, unknown]: spatial_monitor: Alan left Conference Room (Zone Role contains Person role)
[01/29/14 16:43:00, 10.100.120.120, unknown]: spatial_monitor: Kurt entered Conference Room (Computer desk contains Person role)
[01/29/14 16:43:02, 10.100.120.120, unknown]: spatial_monitor: Kurt left Conference Room (Computer desk contains Person role)
[01/29/14 16:43:03, 10.100.120.120, unknown]: spatial_monitor: Alan entered Conference Room (Zone Role contains Person role)
[01/29/14 16:43:08, 10.100.120.120, unknown]: spatial_monitor: Alan left Conference Room (Zone Role contains Person role)
[01/29/14 16:46:07, 10.100.120.120, unknown]: spatial_monitor: Fred entered Conference Room (Zone Role contains Person role)
[01/29/14 16:46:08, 10.100.120.120, unknown]: spatial_monitor: Fred left Conference Room (Zone Role contains Person role)

I am trying to use str_extract in R (in library stringr) to extract the names of locations ("Conference Room" in example above). The logic is to pull the portion of string which follows the words "entered" or "left". To this end, i have the following regular expression

(?<=entered\s)[A-Z][a-z]+\s[A-Z][a-z]+

This works fine in Notepad++, however when i embed this in R, i get the following error

> tt <- "[01/29/14 16:42:55, 10.100.120.120, unknown]: spatial_monitor: Alan entered Conference Room (Zone Role contains Person role)"
> str_extract(tt, '(?<=entered\s)[A-Z][a-z]+\s[A-Z][a-z]+')
Error in regexpr("(?<=entered\s)[A-Z][a-z]+\s[A-Z][a-z]+", "[01/29/14 16:42:55, 10.100.120.120, unknown]: spatial_monitor: Alan entered Conference Room (Zone Role contains Person role)",  : 
  invalid regular expression '(?<=entered\s)[A-Z][a-z]+\s[A-Z][a-z]+', reason 'Invalid regexp'

Other answers tell me that lookahead and lookbehind only work with Perl. So the question is how to enable Perl with str_extract? Or is there a better way of doing this? Thanks in advance.

lukeA · Accepted Answer

library(stringr)
tt <- "[01/29/14 16:42:55, 10.100.120.120, unknown]: spatial_monitor: Alan entered Conference Room (Zone Role contains Person role)"
str_extract(tt, perl('(?<=entered\s)[A-Z][a-z]+\s[A-Z][a-z]+'))
# [1] "Conference Room"

Update: With stringr 1.3.0 2018-02-19, perl() was removed. You can now simply do str_extract(tt, '(?<=entered\s)[A-Z][a-z]+\s[A-Z][a-z]+').

lookbehind in str_extract with R

Answers (2)

Related Questions