TheCodeNovice
TheCodeNovice

Reputation: 690

reference only last match in str_extract_all in R

I have some code that is parsing some text and I have a sticking point with my regex expression sometimes capturing two things instead of one due to wonky data.

temp <- "abc abcdef"
library(stringr)
str_extract_all(temp,"ab.+")
[[1]]
[1] "abc abcdef"

str_extract_all(temp,"ab.+")[[1]][2]
[1] NA

Above is a simple example that I am working with. when I rapply this function, I may get 1,2 or 3 matches. The last match will be most important for my usage but I am not sure how to reference it.

Upvotes: 0

Views: 1090

Answers (3)

gaut
gaut

Reputation: 5958

you can use for example:

str_extract_all(temp,"ab.+")[[length(str_extract_all(temp,"ab.+"))]]

Upvotes: 1

moodymudskipper
moodymudskipper

Reputation: 47320

Not very elegant but it gets the job done:

. <- str_extract_all(temp,".*?(?=(ab)|$)")[[1]]
paste0("a",.[[length(.)-1]])
# [1] "abcdef"

Or maybe you wanted something like this if your output can be only a word ?

. <- str_extract_all(temp,"\\bab.+?\\b")[[1]]
dplyr::last(.)
#[1] "abcdef"

Upvotes: 1

Julius Vainora
Julius Vainora

Reputation: 48211

As I understand, you mean something like

txt <- "bag of flour"
str_extract_all(txt, "\\b[a-z]+\\b")
# [[1]]
# [1] "bag"   "of"    "flour"

and referring to "flour". In that case you may use

tail(str_extract_all(txt, "\\b[a-z]+\\b")[[1]], 1)
# [1] "flour"

Upvotes: 2

Related Questions