CptNemo
CptNemo

Reputation: 6755

gsub returns \n (newline)

I have this behaviour of regex which I can't explain. My goal is to parse only the text after the @ yet when my string contains \n preceded by some words, gsub parses also \n:

string <- ".@address something \n"
gsub("^\\.?@([a-z0-9_]{1,15})[^a-z0-9_]+.*$", "\\1", string, perl=T);
# [1] "address\n"
string <- ".@address \n"
gsub("^\\.?@([a-z0-9_]{1,15})[^a-z0-9_]+.*$", "\\1", string, perl=T);
# [1] "address"

Upvotes: 2

Views: 578

Answers (2)

akrun
akrun

Reputation: 887391

To extract address, you could also use:

library(stringr)
 str_extract(string, perl('(?<=@)[a-z0-9_]+(?= )'))
#[1] "address"

Upvotes: 0

Sven Hohenstein
Sven Hohenstein

Reputation: 81703

In Perl-compatible regular expressions . does not match \n. This is in contrast to "normal" regular expressions. Have a look at this example:

grepl(".", "\n", perl = FALSE)
# [1] TRUE
grepl(".", "\n", perl = TRUE)
# [1] FALSE

Your code will work if you specify perl = FALSE:

gsub("^\\.?@([a-z0-9_]{1,15})[^a-z0-9_]+.*$", "\\1", string, perl = FALSE)
# [1] "address"

Upvotes: 3

Related Questions