Reputation: 6874
I have a column in my dataframe which has free text in it
I would like to extract the text after INDICATIONS FOR EXAMINATION
and before the next capitalized line. In the example below the result would be 'Anaemia'
INDICATIONS FOR EXAMINATION
Anaemia
PROCEDURE PERFORMED
Gastroscopy (OGD)
I am having some trouble as I'm using stringr
and I can't seem to get multiline matches.
I have been using:
EoE$IndicationsFroExamination<-str_extract(EoE$Endo_ResultText, '(?<=INDICATIONS FOR EXAMINATION).*?[A-Z]+')
Upvotes: 1
Views: 1060
Reputation: 46
It requires a little digging. You can use the regex()
modifier function.
multiline
argument to switch on multiline fitting:str_extract_all("a\nb\nc", "^.")
# [[1]]
# [1] "a"
str_extract_all("a\nb\nc", regex("^.", multiline = TRUE))
# [[1]]
# [1] "a" "b" "c"
dotall
argument, that will switch on multiline behaviour of ".*"
:str_extract_all("a\nb\nc", "a.")
# [[1]]
# character(0)
str_extract_all("a\nb\nc", regex("a.", dotall = TRUE))
# [[1]]
# [1] "a\n"
These are documented in stringi::stri_opts_regex()
, which stringr::regex()
passes arguments to.
Upvotes: 3
Reputation: 4721
I made the regular expression a bit more generic so it will match all occurrences and used the str_extract_all
package from stringr
:
matches <- str_extract_all(str, "(?<=[A-Z]\n)([^\n]*)")
Which, given the string you provided, should return:
[[1]]
[1] "Anaemia" "Gastroscopy (OGD)"
Upvotes: 2