Multiline text extraction in R with stringr

Question

I have a column in my dataframe which has free text in it

I would like to extract the text after INDICATIONS FOR EXAMINATION and before the next capitalized line. In the example below the result would be 'Anaemia'

INDICATIONS FOR EXAMINATION
Anaemia

PROCEDURE PERFORMED
Gastroscopy (OGD)

I am having some trouble as I'm using stringr and I can't seem to get multiline matches. I have been using:

EoE$IndicationsFroExamination<-str_extract(EoE$Endo_ResultText, '(?<=INDICATIONS FOR EXAMINATION).*?[A-Z]+')

krzyklo · Accepted Answer

It requires a little digging. You can use the regex() modifier function.

Use the multiline argument to switch on multiline fitting:

str_extract_all("a
b
c", "^.")
# [[1]]
# [1] "a"

str_extract_all("a
b
c", regex("^.", multiline = TRUE))
# [[1]]
# [1] "a" "b" "c"

Please be aware of the dotall argument, that will switch on multiline behaviour of ".*":

str_extract_all("a
b
c", "a.")
# [[1]]
# character(0)

str_extract_all("a
b
c", regex("a.", dotall = TRUE))
# [[1]]
# [1] "a
"

These are documented in stringi::stri_opts_regex(), which stringr::regex() passes arguments to.

Multiline text extraction in R with stringr

Answers (2)

Related Questions