Using regex in tidyR separate_rows() and its "sep"-attribute does not work

Question

I have these data:

df <- data.frame("author" = c("Kardos, NN (Fraunhofer Austria); Laflamme, NN (Fraunhofer Austria); Gallina, NN (Fraunhofer Austria); Sihn, NN (Fraunhofer Austria; TU Wien)", 
        "Demeter, NN (TU Wien; TU Wien); Derx, NN (TU Wien); Komma, NN (TU Wien); Parajka, NN (TU Wien); Schijven, NN (National Institute for Public Health and the Environment; Utrecht University); Sommer, NN (Medical University of Vienna)",
        "Prendl, NN (TU Wien); Schenzel, NN (TU Wien); Hofmann, NN (TU Wien)", 
        "Müller, NN (TU Wien); Knoll, NN (TU Wien; TU Wien); Gravogl, NN (TU Wien; University of Vienna); Jordan, NN (TU Wien); Eitenberger, NN (TU Wien); Friedbacher, NN (TU Wien); Artner, Werner (TU Wien); Welch, NN M. (TU Wien); Werner, NN (TU Wien)"
))

With a specific regex (which I got from here), I am able to extract each person. This works well:

stringr::str_extract_all(df$author, "\w+,\s*\w+\s*$[^()]*(?:\([^()]*$[^()]*)*\);?")

However, the same regex does not work when I use tidyr::separate_rows():

tidyr::separate_rows(df, author, sep = "\w+,\s*\w+\s*$[^()]*(?:\([^()]*$[^()]*)*\);?")

How comes? What is the issue here? How can I use that regex with separate_rows()?

Wiktor Stribiżew · Accepted Answer

The point here is that a regex that is used for extracting texts matches the text you need to get. The regex used in a splitting function removes the matches and split the original string in the location of the matches.

You can use

tidyr::separate_rows(df, author, sep = "(?<=\));\s*")

See the regex demo

Details

(?<=\)) - a location immediately preceded with )
; - a semi-colon
\s* - zero or more whitespaces.

These matches are found and separate_rows will split the original strings in the place where the matches occur while removing the match texts.

Using regex in tidyR separate_rows() and its "sep"-attribute does not work

Answers (2)

Related Questions

Using regex in tidyR separate_rows() and its &quot;sep&quot;-attribute does not work

Answers (2)

Related Questions

Using regex in tidyR separate_rows() and its "sep"-attribute does not work