jjankowiak
jjankowiak

Reputation: 3279

Extracting words separated by special sign (regex)

I have some sting like "John a11|a12|\n Ana a21|a22|\n Jake a31|a23|\n " I would like to extract all parts seperated by "|" using regex.

So I want the output

"John a11" "a12" "Ana a21" "a22" "Jake a31" "a23"

And ideas how to create proper regex or maybe it requires just some function in R?

Upvotes: 0

Views: 118

Answers (3)

hwnd
hwnd

Reputation: 70732

You can split on | and make the newline sequence optional followed by "zero or more" spaces.

x <- 'John a11|a12|\n Ana a21|a22|\n Jake a31|a23|\n  '
strsplit(x, '\\|\n? *')[[1]]
# [1] "John a11" "a12"      "Ana a21"  "a22"      "Jake a31" "a23"  

Upvotes: 3

Jim
Jim

Reputation: 4767

Alternatively Using rex may make this type of task a little simpler.

x <- "John a11|a12|\n  Ana a21|a22|\n  Jake a31|a23|\n   "

library(rex)    
re_matches(x,
  rex(
      any_spaces,
      capture(name = 'text',
        except_some_of("|")
      ),
      any_spaces),
  global = TRUE)[[1]]
#>      text
#>1 John a11
#>2      a12
#>3  Ana a21
#>4      a22
#>5 Jake a31
#>6      a23
#>7

Upvotes: 1

Andrie
Andrie

Reputation: 179448

Try using strsplit() with the split regular expression "[\\||\n] *":

x <- "John a11|a12|\n  Ana a21|a22|\n  Jake a31|a23|\n   "

strsplit(x, split="[\\||\n] *")[[1]]
[1] "John a11" "a12"      ""         "Ana a21"  "a22"      ""         "Jake a31" "a23"      ""

Upvotes: 6

Related Questions