How to make optional lookbehind and lookahead in r

Question

I would like to extract the text between de and en as well as the text in the strings that don't have de or en. I am not very good with regex but after reading about lookaheads and lookbehinds I managed to get partly what I want. Now I have to make them optional but whatever I've tried, I can't get it right. Any help would be highly appreciated!

library(stringr)
(sstring = c('{\"de\":\"extract this one\",\"en\":\"some text\"}',     'extract this one',     '{\"de\":\"extract this one\",\"en\":\"some text\"}', "p (340) extract this one"))
#> [1] "{\"de\":\"extract this one\",\"en\":\"some text\"}"
#> [2] "extract this one"                                  
#> [3] "{\"de\":\"extract this one\",\"en\":\"some text\"}"
#> [4] "p (340) extract this one"

str_extract_all(string = sstring, pattern = "(?<=.de\":\").*(?=.,\"en\":)")
#> [[1]]
#> [1] "extract this one"
#> 
#> [[2]]
#> character(0)
#> 
#> [[3]]
#> [1] "extract this one"
#> 
#> [[4]]
#> character(0)

desired output:

#> [1] "extract this one"         "extract this one"        
#> [3] "extract this one"         "p (340) extract this one"

^{Created on 2020-05-08 by the reprex package (v0.3.0)}

Daniel O · Accepted Answer

in Base R

gsub('.*de":"(.*)","en.*',"\1",sstring)


[1] "extract this one"        
[2] "extract this one"        
[3] "extract this one"        
[4] "p (340) extract this one"

Where:

.* indicates any length of any character
(...) brackets store whats inside to latter be recalled by "\1" Essentially, were subbing the entire string with the matching patterns with only the text we want.

How to make optional lookbehind and lookahead in r

Answers (2)

Related Questions