Replace second space for \n if applies in R

I have a vector of text, lets say:

vector <- c("20 DE NOVIEMBRE",  "CENTRO", "EL ARENAL 4A SECCION",     "IGNACIO ZARAGOZA", "JARDIN BALBUENA", "MOCTEZUMA 2A SECCION",    "MORELOS", "PEON DE LOS BAOS")

I want to substitute second space, if exists, with the special character "\n".

I've tried this:

  vector <- gsub(".* .*( ).*", "\\\n", vector)

But didn't work.

This is the expected result:

c("20 DE\nNOVIEMBRE",  "CENTRO", "EL ARENAL\n4A SECCION",     "IGNACIO ZARAGOZA", "JARDIN BALBUENA", "MOCTEZUMA 2A\nSECCION",    "MORELOS", "PEON DE\nLOS BAOS")

How can I get it?

Upvotes: 1

Views: 565

Answers (2)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627044

You may use

vector <- c("20 DE NOVIEMBRE",  "CENTRO", "EL ARENAL 4A SECCION",     "IGNACIO ZARAGOZA", "JARDIN BALBUENA", "MOCTEZUMA 2A SECCION",    "MORELOS", "PEON DE LOS BAOS")
sub("^\\S+\\s+\\S+\\K\\s+", "\n", vector, perl=TRUE)

Output of the R demo:

[1] "20 DE\nNOVIEMBRE"      "CENTRO"                "EL ARENAL\n4A SECCION"
[4] "IGNACIO ZARAGOZA"      "JARDIN BALBUENA"       "MOCTEZUMA 2A\nSECCION"
[7] "MORELOS"               "PEON DE\nLOS BAOS"    

The regex is ^\S+\s+\S+\K\s+ (see demo), it matches

  • ^ - start of string
  • \S+ - 1+ non-whitespaces
  • \s+ - 1+ whitespaces
  • \S+ - 1+ non-whitespaces
  • \K - match reset operator discarding all text matched so far
  • \s+ - 1+ whitespace chars.

Upvotes: 1

Tim Biegeleisen
Tim Biegeleisen

Reputation: 521864

One approach, using sub with capture groups:

vector <- sub("^(\\S+) (\\S+) ", "\\1 \\2\n", vector)
vector

[1] "20 DE\nNOVIEMBRE"      "CENTRO"                "EL ARENAL\n4A SECCION"
[4] "IGNACIO ZARAGOZA"      "JARDIN BALBUENA"       "MOCTEZUMA 2A\nSECCION"
[7] "MORELOS"               "PEON DE\nLOS BAOS"    

Data:

vector <- c("20 DE NOVIEMBRE",  "CENTRO", "EL ARENAL 4A SECCION",
            "IGNACIO ZARAGOZA", "JARDIN BALBUENA", "MOCTEZUMA 2A SECCION",
            "MORELOS", "PEON DE LOS BAOS")

The regex logic here simply says to capture the first and second words, given by \S+, consuming the first and second space as well. Note that this would only match should the input in fact have a second space. Then, we replace with the same, but substituting a \n line feed in place of the second space.

Upvotes: 2

Related Questions