Reputation: 3279
I have some sting like "John a11|a12|\n Ana a21|a22|\n Jake a31|a23|\n "
I would like to extract all parts seperated by "|"
using regex.
So I want the output
"John a11" "a12" "Ana a21" "a22" "Jake a31" "a23"
And ideas how to create proper regex or maybe it requires just some function in R?
Upvotes: 0
Views: 118
Reputation: 70732
You can split on |
and make the newline sequence optional followed by "zero or more" spaces.
x <- 'John a11|a12|\n Ana a21|a22|\n Jake a31|a23|\n '
strsplit(x, '\\|\n? *')[[1]]
# [1] "John a11" "a12" "Ana a21" "a22" "Jake a31" "a23"
Upvotes: 3
Reputation: 4767
Alternatively Using rex may make this type of task a little simpler.
x <- "John a11|a12|\n Ana a21|a22|\n Jake a31|a23|\n "
library(rex)
re_matches(x,
rex(
any_spaces,
capture(name = 'text',
except_some_of("|")
),
any_spaces),
global = TRUE)[[1]]
#> text
#>1 John a11
#>2 a12
#>3 Ana a21
#>4 a22
#>5 Jake a31
#>6 a23
#>7
Upvotes: 1
Reputation: 179448
Try using strsplit()
with the split regular expression "[\\||\n] *"
:
x <- "John a11|a12|\n Ana a21|a22|\n Jake a31|a23|\n "
strsplit(x, split="[\\||\n] *")[[1]]
[1] "John a11" "a12" "" "Ana a21" "a22" "" "Jake a31" "a23" ""
Upvotes: 6