asieira
asieira

Reputation: 3683

Splitting strings with unescaped separator in R

I have to read a file with R, where a variable number of columns is separated by the | character. However, if it is preceded by a \ it should not be considered a separator.

I first thought something like strsplit(x, "[^\\][|]") would work, but the problem here is that the character before each pipe is "consumed":

> strsplit("word1|word2|word3\\|aha!|word4", "[^\\][|]")
[[1]]
[1] "word"        "word"        "word3\\|aha" "word4" 

Can anyone suggest a way to do this? Ideally it should be vectorized since the files in question are very large.

Upvotes: 4

Views: 165

Answers (2)

Anirudha
Anirudha

Reputation: 32787

You need to use zero width assertion(lookbehind)

(?<!\\\\)[|]

Upvotes: 4

Tyler Rinker
Tyler Rinker

Reputation: 109844

I believe this works; using Anirudh's downvoted answer (not sure why the downvote, it doesn't work but the regex was correct)

strsplit(x, "(?<!\\\\)[|]", perl=TRUE)

## > strsplit(x, "(?<!\\\\)[|]", perl=TRUE)
## [[1]]
## [1] "word1"        "word2"        "word3\\|aha!" "word4" 

Upvotes: 5

Related Questions