Reputation: 136
I'm trying to parse a character vector in R, but I seem to be getting inconsistent results. I can't figure out why. Here's the vector:
> str(superbowl$Detail)
chr [1:189] "Matthew Bosher kicks off 65 yards touchback" ...
> dput(superbowl[ , 5])
c("Matthew Bosher kicks off 65 yards touchback", "Tom Brady pass incomplete short middle intended for Julian Edelman",
"Tom Brady pass complete short right to Julian Edelman for 9 yards (tackle by Philip Wheeler)",
"LeGarrette Blount right tackle for no gain (tackle by Deion Jones)",
"Ryan Allen punts 51 yards returned by Eric Weems for 1 yard (tackle by Barkevious Mingo). Penalty on Paul Worrilow: Offensive Holding 7 yards",
"Devonta Freeman left end for 37 yards (tackle by Malcolm Butler and Devin McCourty)",
"Devonta Freeman left end for 3 yards (tackle by Trey Flowers and Malcom Brown)",
"Matt Ryan pass complete short right to Patrick DiMarco for 2 yards (tackle by Patrick Chung)",
"Matt Ryan sacked by Trey Flowers for -10 yards", "Matthew Bosher punts 55 yards returned by Julian Edelman for 5 yards (tackle by C.J. Goodwin)",
"Julian Edelman right end for 2 yards (tackle by Keanu Neal and Deion Jones)",
"Tom Brady pass complete short left to Danny Amendola for 13 yards (tackle by Brian Poole)",
"Tom Brady pass complete short left to Chris Hogan for 15 yards (tackle by Jalen Collins)",
"LeGarrette Blount left tackle for 2 yards (tackle by Keanu Neal and Brooks Reed)",
"Tom Brady pass complete short right to Malcolm Mitchell for 7 yards (tackle by Deion Jones)",
"Tom Brady pass complete short middle to James White for 11 yards (tackle by Jalen Collins)",
"Tom Brady sacked by Courtney Upshaw for -8 yards", "Tom Brady pass incomplete deep left intended for James White (defended by Deion Jones)",
"Tom Brady sacked by Grady Jarrett for -1 yards", "Ryan Allen punts 37 yards fair catch by Eric Weems",
"Tevin Coleman right end for 9 yards (tackle by Devin McCourty)",
"Matt Ryan pass complete short left to Patrick DiMarco for 10 yards (tackle by Patrick Chung and Logan Ryan)",
"Devonta Freeman right tackle for 2 yards (tackle by Alan Branch and Rob Ninkovich)",
"Tevin Coleman left end for 5 yards (tackle by Logan Ryan)",
"Matt Ryan sacked by Jabaal Sheard and Alan Branch for -2 yards",
"Matthew Bosher punts 55 yards returned by Julian Edelman for 9 yards (tackle by Eric Weems)",
"Tom Brady pass complete short left to Julian Edelman for 13 yards (tackle by Robert Alford)",
"LeGarrette Blount middle for 7 yards (tackle by Robert Alford and Ricardo Allen)",
"LeGarrette Blount middle for 2 yards (tackle by De'Vondre Campbell)",
"Tom Brady pass complete deep right to Julian Edelman for 27 yards (tackle by Robert Alford)",
"LeGarrette Blount left tackle for 4 yards (tackle by Deion Jones). LeGarrette Blount fumbles (forced by Deion Jones) recovered by Robert Alford at ATL-29 (tackle by Julian Edelman). Penalty on Martellus Bennett: Offensive Holding (Declined)",
"Matt Ryan pass complete short middle to Julio Jones for 19 yards (tackle by Logan Ryan)",
"Matt Ryan pass complete deep left to Julio Jones for 23 yards",
"Devonta Freeman right tackle for 15 yards (tackle by Devin McCourty)",
"Devonta Freeman left tackle for 9 yards (tackle by Duron Harmon)",
"Timeout #1 by New England Patriots", "Devonta Freeman left end for 5 yards touchdown",
"Matt Bryant kicks extra point good", "Matthew Bosher kicks off 69 yards returned by Dion Lewis for 20 yards (tackle by Sharrod Neasman)",
"Tom Brady pass incomplete deep right intended for Malcolm Mitchell (defended by C.J. Goodwin)",
"Dion Lewis middle for 3 yards (tackle by Ricardo Allen)", "Tom Brady pass complete short left to James White for 5 yards (tackle by C.J. Goodwin)",
"Ryan Allen punts 38 yards", "Matt Ryan pass complete deep right to Taylor Gabriel for 24 yards (tackle by Devin McCourty)",
"Matt Ryan pass complete deep left to Julio Jones for 18 yards (tackle by Eric Rowe)",
"Tevin Coleman right tackle for 1 yard (tackle by Alan Branch)",
"Matt Ryan pass incomplete deep right intended for Austin Hooper (defended by Patrick Chung)",
"Matt Ryan pass complete deep left to Austin Hooper for 19 yards touchdown. Penalty on Patrick Chung: Defensive Pass Interference (Declined)",
"Penalty on Shea McClellin: Illegal Formation 5 yards (no play)",
"Matt Bryant kicks extra point good", "Matthew Bosher kicks off 65 yards touchback",
"Tom Brady pass complete short middle to Martellus Bennett for 12 yards (tackle by Jalen Collins and Keanu Neal)",
"Tom Brady pass incomplete short middle intended for Julian Edelman",
"Tom Brady pass incomplete deep middle intended for Danny Amendola",
"Tom Brady pass complete short right to James White for 8 yards (tackle by Deion Jones and De'Vondre Campbell). Penalty on Robert Alford: Defensive Holding 5 yards (no play)",
"LeGarrette Blount left tackle for no gain (tackle by Joe Vellano and Jonathan Babineaux)",
"James White right end for 7 yards (tackle by Robert Alford)",
"Tom Brady pass incomplete deep right intended for Julian Edelman. Penalty on Brian Poole: Defensive Holding 5 yards (no play)",
"LeGarrette Blount right end for 1 yard (tackle by Tyson Jackson)",
"Tom Brady pass incomplete short right intended for Dion Lewis",
"Timeout #1 by Atlanta Falcons", "Tom Brady pass incomplete short middle intended for Julian Edelman. Penalty on Brian Poole: Defensive Holding 5 yards (no play)",
"LeGarrette Blount middle for no gain (tackle by Ra'Shede Hageman)",
"Tom Brady pass complete short middle to Martellus Bennett for 13 yards (tackle by Ayodeji Olatoye)",
"Dion Lewis right guard for 4 yards (tackle by De'Vondre Campbell)",
"Dion Lewis left guard for no gain (tackle by Grady Jarrett)",
"Tom Brady pass incomplete short left intended for Danny Amendola is intercepted by Robert Alford at ATL-18 and returned for 82 yards touchdown",
"Matt Bryant kicks extra point good", "Matthew Bosher kicks off 65 yards touchback",
"James White right guard for 3 yards (tackle by Ra'Shede Hageman and Robert Alford)",
"Tom Brady pass complete short left to Martellus Bennett for 15 yards (tackle by Keanu Neal and Ricardo Allen)",
"Timeout #2 by Atlanta Falcons", "Tom Brady pass incomplete deep right intended for Julian Edelman",
"Tom Brady pass complete short right to James White for 28 yards (tackle by Ricardo Allen)",
"Tom Brady pass complete short right to Chris Hogan for 8 yards (tackle by Keanu Neal)",
"Tom Brady pass incomplete short right intended for Chris Hogan",
"Tom Brady pass complete short middle to James White for 6 yards (tackle by Deion Jones)",
"Timeout #2 by New England Patriots", "Tom Brady pass incomplete short left intended for Julian Edelman (defended by Robert Alford)",
"Tom Brady pass complete short left to James White for 5 yards (tackle by Keanu Neal). Penalty on Martellus Bennett: Offensive Holding 10 yards",
"Tom Brady pass complete short left to Martellus Bennett for -3 yards (tackle by Ayodeji Olatoye and Grady Jarrett)",
"Stephen Gostkowski 41 yard field goal good", "Timeout #3 by New England Patriots",
"Stephen Gostkowski kicks off 38 yards", "Stephen Gostkowski kicks off 60 yards returned by Eric Weems for 14 yards (tackle by Nate Ebner)",
"Devonta Freeman left tackle for -3 yards (tackle by Dont'a Hightower)",
"Matt Ryan pass complete short middle to Devonta Freeman for 7 yards (tackle by Logan Ryan)",
"Matt Ryan pass incomplete short left intended for Taylor Gabriel (defended by Eric Rowe)",
"Matthew Bosher punts 56 yards returned by Julian Edelman for 26 yards (tackle by C.J. Goodwin)",
"ATL challenged the runner was in bounds ruling and the play was overturned. Matthew Bosher punts 56 yards returned by Julian Edelman for 26 yards (tackle by Eric Weems)",
"Tom Brady pass incomplete deep left intended for Chris Hogan",
"Tom Brady pass complete short right to Danny Amendola for -2 yards (tackle by Brian Poole). Penalty on Chris Hogan: Offensive Pass Interference (Declined)",
"Timeout #1 by Atlanta Falcons", "Tom Brady pass incomplete short middle intended for Julian Edelman",
"Ryan Allen punts 40 yards fair catch by Eric Weems", "Matt Ryan pass complete short middle to Taylor Gabriel for 17 yards (tackle by Eric Rowe)",
"Tevin Coleman right tackle for 5 yards (tackle by Elandon Roberts)",
"Matt Ryan pass complete deep middle to Taylor Gabriel for 35 yards (tackle by Duron Harmon)",
"Tevin Coleman left end for no gain (tackle by Patrick Chung)",
"Matt Ryan pass complete short middle to Mohamed Sanu for 13 yards (tackle by Eric Rowe)",
"Devonta Freeman right tackle for 9 yards (tackle by Trey Flowers)",
"Devonta Freeman middle for -3 yards (tackle by Eric Rowe and Rob Ninkovich)",
"Matt Ryan pass incomplete short middle intended for Taylor Gabriel (defended by Malcolm Butler). Penalty on Malcolm Butler: Defensive Pass Interference 3 yards (no play)",
"Matt Ryan pass complete short right to Tevin Coleman for 6 yards touchdown",
"Matt Bryant kicks extra point good", "Matthew Bosher kicks off 65 yards touchback",
"Tom Brady pass complete short middle to Dion Lewis for 2 yards (tackle by Keanu Neal)",
"Tom Brady pass complete short left to James White for 12 yards (tackle by Jalen Collins and Ricardo Allen)",
"Dion Lewis middle for 8 yards (tackle by Jalen Collins)", "Dion Lewis middle for -1 yards (tackle by Brooks Reed)",
"Julian Edelman pass incomplete deep right intended for Dion Lewis",
"Tom Brady pass complete short left to Danny Amendola for 17 yards (tackle by De'Vondre Campbell)",
"Tom Brady pass complete short left to Danny Amendola for 2 yards (tackle by Jalen Collins)",
"Tom Brady pass incomplete short left intended for Julian Edelman",
"Tom Brady middle for 15 yards (tackle by Robert Alford)", "LeGarrette Blount right guard for 4 yards (tackle by Keanu Neal)",
"LeGarrette Blount right tackle for 9 yards (tackle by Robert Alford)",
"LeGarrette Blount middle for 2 yards (tackle by Brooks Reed)",
"Tom Brady pass complete short left to James White for 5 yards touchdown",
"Stephen Gostkowski kicks extra point no good", "Stephen Gostkowski kicks onside 11 yards recovered by LaRoy Reynolds. Penalty on Stephen Gostkowski: Illegal Touch Kick 5 yards",
"Matt Ryan pass complete short left to Austin Hooper for 9 yards (tackle by Duron Harmon)",
"Tevin Coleman left tackle for -1 yards (tackle by Trey Flowers). Penalty on Jake Matthews: Offensive Holding 10 yards (no play)",
"Timeout #2 by Atlanta Falcons", "Matt Ryan pass incomplete short right intended for Austin Hooper (defended by Patrick Chung)",
"--", "Penalty on Matthew Bosher: Delay of Game 5 yards (no play)",
"Matthew Bosher punts 42 yards returned by Patrick Chung for -1 yards (tackle by Justin Hardy)",
"Tom Brady pass complete deep right to Malcolm Mitchell for 15 yards (tackle by Jalen Collins)",
"Tom Brady pass complete short middle to Malcolm Mitchell for 7 yards (tackle by Jalen Collins)",
"James White middle for 6 yards (tackle by Keanu Neal)", "Tom Brady pass incomplete deep left intended for Julian Edelman",
"Tom Brady pass complete short right to Malcolm Mitchell for 18 yards (tackle by Robert Alford)",
"Tom Brady pass complete short right to James White for 9 yards (tackle by Robert Alford)",
"Tom Brady pass incomplete short middle intended for Danny Amendola",
"Tom Brady pass complete deep right to Martellus Bennett for 25 yards (tackle by Keanu Neal)",
"Tom Brady sacked by Grady Jarrett for -5 yards", "Tom Brady pass complete short left to James White for 2 yards (tackle by Jalen Collins and De'Vondre Campbell)",
"Tom Brady sacked by Grady Jarrett for -5 yards", "Stephen Gostkowski 33 yard field goal good",
"Stephen Gostkowski kicks off 48 yards returned by Justin Hardy for 10 yards (tackle by Barkevious Mingo)",
"Tevin Coleman right end for 8 yards (tackle by Patrick Chung)",
"Tevin Coleman middle for 1 yard (tackle by Trey Flowers and Logan Ryan)",
"Matt Ryan sacked by Dont'a Hightower for -11 yards. Matt Ryan fumbles (forced by Dont'a Hightower) recovered by Alan Branch at ATL-25 (tackle by Chris Chester)",
"Tom Brady sacked by Dwight Freeney for -5 yards", "Tom Brady pass complete short middle to James White for 4 yards (tackle by Keanu Neal)",
"Tom Brady pass complete short left to Malcolm Mitchell for 12 yards (tackle by C.J. Goodwin)",
"Tom Brady pass complete short left to Danny Amendola for 8 yards (tackle by Ricardo Allen)",
"Tom Brady pass complete short left to Danny Amendola for 6 yards touchdown",
"Two Point Attempt: James White middle conversion succeeds",
"Stephen Gostkowski kicks off 62 yards returned by Justin Hardy for 7 yards (tackle by Jonathan Jones)",
"Matt Ryan pass complete short left to Devonta Freeman for 39 yards (tackle by Elandon Roberts)",
"Devonta Freeman right end for 2 yards (tackle by Jabaal Sheard and Patrick Chung)",
"Matt Ryan pass complete deep right to Julio Jones for 27 yards",
"Devonta Freeman left end for -1 yards (tackle by Devin McCourty)",
"Matt Ryan sacked by Trey Flowers for -12 yards", "Timeout #1 by New England Patriots",
"Matt Ryan pass complete short left to Mohamed Sanu for 9 yards (tackle by Logan Ryan). Penalty on Jake Matthews: Offensive Holding 10 yards (no play)",
"Matt Ryan pass incomplete short left intended for Taylor Gabriel",
"Matthew Bosher punts 36 yards fair catch by Julian Edelman",
"Tom Brady pass incomplete short right intended for James White",
"Tom Brady pass incomplete deep right intended for Chris Hogan",
"Tom Brady pass complete short right to Chris Hogan for 16 yards (tackle by Jalen Collins)",
"Tom Brady pass incomplete short middle intended for Julian Edelman (defended by Robert Alford)",
"Tom Brady pass complete short left to Malcolm Mitchell for 11 yards (tackle by Jalen Collins)",
"Tom Brady pass complete deep middle to Julian Edelman for 23 yards (tackle by Keanu Neal)",
"ATL challenged the pass completion ruling and the play was upheld.",
"Tom Brady pass complete deep right to Danny Amendola for 20 yards (tackle by Brian Poole)",
"Tom Brady pass complete short right to James White for 13 yards (tackle by Brian Poole and Ricardo Allen)",
"Tom Brady pass complete short right to James White for 7 yards (tackle by Deion Jones)",
"James White right guard for 1 yard touchdown", "Two Point Attempt: Tom Brady pass complete to Danny Amendola conversion succeeds. Penalty on Dwight Freeney: Defensive Offside 5 yards",
"Stephen Gostkowski kicks off 60 yards returned by Eric Weems for 11 yards (tackle by Brandon Bolden)",
"Matt Ryan pass complete short left to Mohamed Sanu for 12 yards (tackle by Logan Ryan)",
"Matt Ryan pass complete short right to Austin Hooper for 4 yards (tackle by Malcolm Butler)",
"Matt Ryan spiked the ball", "Matt Ryan pass incomplete deep left intended for Austin Hooper",
"Matthew Bosher punts 38 yards fair catch by Julian Edelman",
"Dion Lewis for 13 yards", "Matthew Bosher kicks off 65 yards touchback",
"Tom Brady pass complete short left to James White for 6 yards (tackle by Deion Jones)",
"Tom Brady pass complete short right to Danny Amendola for 14 yards",
"Tom Brady pass complete short left to Chris Hogan for 18 yards (tackle by Keanu Neal and Deion Jones)",
"Tom Brady pass complete short left to James White for -3 yards (tackle by Deion Jones)",
"Tom Brady pass complete short left to Julian Edelman for 15 yards (tackle by Robert Alford)",
"James White right end for 10 yards (tackle by Robert Alford)",
"Tom Brady pass incomplete short right intended for Martellus Bennett (defended by De'Vondre Campbell). Penalty on De'Vondre Campbell: Defensive Pass Interference 13 yards (no play)",
"Tom Brady pass incomplete short right intended for Martellus Bennett (defended by Vic Beasley)",
"James White right end for 2 yards touchdown")
I would like to create a new vector, superbowl$Sacker
, that finds the word "sacked," in all elements containing the word "sacked," and returns the second and third words after it (the name of the player credited for the sack). In all elements containing the word "sacked", for which the fourth word after "sacked" is "and" (those instances where two players are credited for the sack), I would like superbowl$Sacked
to equal the second through sixth words after "sacked" ("first-name last-name and first-name last-name"). In all elements not containing "sacked," I'd like superbowl$Sacker == NA
. So, for example, superbowl$Sacker[1:10]
should look like:
> superbowl$Sacker[1:10]
[1] NA NA
[3] NA NA
[5] NA NA
[7] NA NA
[9] "Trey Flowers" NA
I've tried this a few different ways, mostly using gsub()
, and library(stringr)
, but nothing seems to work consistently on all elements of the vector. One element in particular, superbowl$Sacker[144]
, seems to be interpreted differently than the others. Additionally, element superbowl$Sacker[25]
provides a special-case. Here are the three basic approaches I've tried so far:
I.)
> superbowl$Sacker <- gsub("(\\w+\\s)*sacked\\s(\\w+)(\\s\\w+)(\\s\\w+).*", "\\3\\4",superbowl$Detail)
> superbowl$Sacker[superbowl$Sacker == superbowl$Detail] <- NA
> superbowl[superbowl$Is.Sack == TRUE, c(5, 36, 54)]
Detail
9 Matt Ryan sacked by Trey Flowers for -10 yards
17 Tom Brady sacked by Courtney Upshaw for -8 yards
19 Tom Brady sacked by Grady Jarrett for -1 yards
25 Matt Ryan sacked by Jabaal Sheard and Alan Branch for -2 yards
137 Tom Brady sacked by Grady Jarrett for -5 yards
139 Tom Brady sacked by Grady Jarrett for -5 yards
144 Matt Ryan sacked by Dont'a Hightower for -11 yards. Matt Ryan fumbles (forced by Dont'a Hightower) recovered by Alan Branch at ATL-25 (tackle by Chris Chester)
145 Tom Brady sacked by Dwight Freeney for -5 yards
156 Matt Ryan sacked by Trey Flowers for -12 yards
Is.Sack Sacker
9 TRUE Trey Flowers
17 TRUE Courtney Upshaw
19 TRUE Grady Jarrett
25 TRUE Jabaal Sheard
137 TRUE Grady Jarrett
139 TRUE Grady Jarrett
144 TRUE <NA>
145 TRUE Dwight Freeney
156 TRUE Trey Flowers
(The problem here is that superbowl$Sacker[25] == "Jabaal Sheard"
instead of superbowl$Sacker[25] == "Jabaal Sheard and Alan Branch"
and superbowl$Sacker[144] == NA
instead of superbowl$Sacker[144] == "Dont'a Hightower"
.)
II.)
> superbowl$Sacker <- ifelse(superbowl$Is.Sack == TRUE, gsub("(\\w+\\s)(\\w+\\s)(\\w+\\s)(\\w+\\s)(\\w+\\s)(\\w+\\s).*","\\5\\6",superbowl$Detail), NA)
> superbowl[superbowl$Is.Sack == TRUE, c(5, 36, 54)]
Detail
9 Matt Ryan sacked by Trey Flowers for -10 yards
17 Tom Brady sacked by Courtney Upshaw for -8 yards
19 Tom Brady sacked by Grady Jarrett for -1 yards
25 Matt Ryan sacked by Jabaal Sheard and Alan Branch for -2 yards
137 Tom Brady sacked by Grady Jarrett for -5 yards
139 Tom Brady sacked by Grady Jarrett for -5 yards
144 Matt Ryan sacked by Dont'a Hightower for -11 yards. Matt Ryan fumbles (forced by Dont'a Hightower) recovered by Alan Branch at ATL-25 (tackle by Chris Chester)
145 Tom Brady sacked by Dwight Freeney for -5 yards
156 Matt Ryan sacked by Trey Flowers for -12 yards
Is.Sack
9 TRUE
17 TRUE
19 TRUE
25 TRUE
137 TRUE
139 TRUE
144 TRUE
145 TRUE
156 TRUE
Sacker
9 Trey Flowers
17 Courtney Upshaw
19 Grady Jarrett
25 Jabaal Sheard
137 Grady Jarrett
139 Grady Jarrett
144 Matt Ryan sacked by Dont'a Hightower for -11 yards. Matt Ryan fumbles (forced by Dont'a Hightower) recovered by Alan Branch at ATL-25 (tackle by Chris Chester)
145 Dwight Freeney
156 Trey Flowers
(The problem here is that superbowl$Sacker[25] == "Jabaal Sheard"
instead of superbowl$Sacker[25] == "Jabaal Sheard and Alan Branch"
and superbowl$Sacker[144] == "Matt Ryan sacked by Dont'a Hightower for -11 yards. Matt Ryan fumbles (forced by Dont'a Hightower) recovered by Alan Branch at ATL-25 (tackle by Chris Chester)"
instead of superbowl$Sacker[144] == "Dont'a Hightower"
.)
III.)
> str.c <- str_extract(superbowl$Detail, "(\\w+\\s)*sacked\\s(\\w+)(\\s\\w+)(\\s\\w+).*")
> superbowl$Sacker <- str_sub(str.c, start=21, end=str_length(str.c)-13)
> superbowl[superbowl$Is.Sack == TRUE, c(5, 36, 54)]
Detail
9 Matt Ryan sacked by Trey Flowers for -10 yards
17 Tom Brady sacked by Courtney Upshaw for -8 yards
19 Tom Brady sacked by Grady Jarrett for -1 yards
25 Matt Ryan sacked by Jabaal Sheard and Alan Branch for -2 yards
137 Tom Brady sacked by Grady Jarrett for -5 yards
139 Tom Brady sacked by Grady Jarrett for -5 yards
144 Matt Ryan sacked by Dont'a Hightower for -11 yards. Matt Ryan fumbles (forced by Dont'a Hightower) recovered by Alan Branch at ATL-25 (tackle by Chris Chester)
145 Tom Brady sacked by Dwight Freeney for -5 yards
156 Matt Ryan sacked by Trey Flowers for -12 yards
Is.Sack Sacker
9 TRUE Trey Flowers
17 TRUE Courtney Upshaw
19 TRUE Grady Jarrett
25 TRUE Jabaal Sheard and Alan Branch
137 TRUE Grady Jarrett
139 TRUE Grady Jarrett
144 TRUE <NA>
145 TRUE Dwight Freeney
156 TRUE Trey Flowers
(The problem here, although you can't see it, is that some of these elements contain an additional space at the end of the list of characters and some do not. I cannot account for this discrepancy, but I'd like them all to end with the last character of the name, not contain a space. Additionally, superbowl$Sacker[144] == NA
instead of superbowl$Sacker[144] == "Dont'a Hightower"
.)
I am new to R and don't completely understand the nuances of regular expressions. Coding aside, is R interpreting superbowl$Sacker[144]
differently than the other elements? If so, which characteristics make it unique? Most importantly, though, how do I tell R to call the second and third words after "sacked", the second through sixth words after "sacked" when the fourth word after "sacked" is "and", and NA
in all other cases?
Upvotes: 2
Views: 158
Reputation: 627469
You seem to want to get any substring between sacked by
and for
.
I extracted the sample vectors you explained into a separate variable:
> y<-x[c(1:10,25,144)]
> y
[1] "Matthew Bosher kicks off 65 yards touchback"
[2] "Tom Brady pass incomplete short middle intended for Julian Edelman"
[3] "Tom Brady pass complete short right to Julian Edelman for 9 yards (tackle by Philip Wheeler)"
[4] "LeGarrette Blount right tackle for no gain (tackle by Deion Jones)"
[5] "Ryan Allen punts 51 yards returned by Eric Weems for 1 yard (tackle by Barkevious Mingo). Penalty on Paul Worrilow: Offensive Holding 7 yards"
[6] "Devonta Freeman left end for 37 yards (tackle by Malcolm Butler and Devin McCourty)"
[7] "Devonta Freeman left end for 3 yards (tackle by Trey Flowers and Malcom Brown)"
[8] "Matt Ryan pass complete short right to Patrick DiMarco for 2 yards (tackle by Patrick Chung)"
[9] "Matt Ryan sacked by Trey Flowers for -10 yards"
[10] "Matthew Bosher punts 55 yards returned by Julian Edelman for 5 yards (tackle by C.J. Goodwin)"
[11] "Matt Ryan sacked by Jabaal Sheard and Alan Branch for -2 yards"
[12] "Matt Ryan sacked by Dont'a Hightower for -11 yards. Matt Ryan fumbles (forced by Dont'a Hightower) recovered by Alan Branch at ATL-25 (tackle by Chris Chester)"
And ran
> library(stringr)
> trimws(str_extract(y, "(?<=\\bsacked by).+?(?=\\bfor\\b)"))
[1] NA NA NA NA NA
[6] NA NA NA "Trey Flowers" NA
[11] "Jabaal Sheard and Alan Branch" "Dont'a Hightower"
Here, the pattern (?<=\\bsacked by).+?(?=\\bfor\\b)
means:
(?<=\\bsacked by)
- make sure there is a whole word sacked
, space, and by
immediately to the left of the current location.+?
- any 1 or more chars other than line break chars as few as possible, up to the first occurrence of (but excluding it from the match)(?=\\bfor\\b)
- whole word for
.Note that \b
are word boundaries that help match a substring of word chars as a whole word.
Here is a variation of the regex with str_match
, that allows access to the capturing group contents (and thus, we may use quantifiers with spaces, just in case there may be more than 1 between sacked
and by
):
> res <- str_match(y, "\\bsacked\\s+by\\s*(.+?)\\s*\\bfor\\b")
> res[,2]
[1] NA NA NA NA NA
[6] NA NA NA "Trey Flowers" NA
[11] "Jabaal Sheard and Alan Branch" "Dont'a Hightower"
Upvotes: 3