aiorr
aiorr

Reputation: 599

Inserting a character before specified character in hierarchical manner in R

test.dat <- c("abcde", "abcXe", "abcdY", "abcXY", "abYcXY", "abcYX")
test.want <- c("abcde", "abc1Xe", "abcd1Y", "abc1XY", "abYc1XY", "abcY1X")

Suppose I wish to add "1" before "X" or "Y", and only before "X" if both "X" and "Y" exist.

library(tidyverse)
case_when(
  str_detect(test.dat, "X") ~ str_replace(test.dat, "X", "1X"),
  str_detect(test.dat, "Y") ~ str_replace(test.dat, "Y", "1Y"),
  TRUE ~ as.character(test.dat)
)

This works but is there a better way to do this in concise manner? Perhaps in single str_replace?

How about a second scenario if it was either "X" or "Y" whichever comes first?

test.dat <- c("abcde", "abcXe", "abcdY", "abcXY", "abYcXY", "abcYX")
test.want <- c("abcde", "abc1Xe", "abcd1Y", "abc1XY", "ab1YcXY", "abc1YX")

stringr is preferable but I welcome any other methods. Thank you.

Upvotes: 3

Views: 108

Answers (2)

GKi
GKi

Reputation: 39707

You can use a look ahead with (?=X) for X and (?=Y) for Y and make the decission if there is an X with ifelse and grepl.

test.dat <- c("abcde", "abcXe", "abcdY", "abcXY", "abYcXY", "abcYX", "YXXdY")

ifelse(grepl("X", test.dat)
     , sub("(?=X)", "1", test.dat, perl=TRUE)
     , sub("(?=Y)", "1", test.dat, perl=TRUE))
#[1] "abcde"   "abc1Xe"  "abcd1Y"  "abc1XY"  "abYc1XY" "abcY1X"  "Y1XXdY"

or

sub("(?=X)|(?=Y(?!.*X))", "1", test.dat, perl=TRUE)
#[1] "abcde"   "abc1Xe"  "abcd1Y"  "abc1XY"  "abYc1XY" "abcY1X"  "Y1XXdY"

Where (?=X) matches a position before X and (?=Y(?!.*X)) matches a position before Y which has no X at any position afterwards.

In case not only the first hit should be used:

ifelse(grepl("X", test.dat)
     , gsub("(?=X)", "1", test.dat, perl=TRUE)
     , gsub("(?=Y)", "1", test.dat, perl=TRUE))
#[1] "abcde"   "abc1Xe"  "abcd1Y"  "abc1XY"  "abYc1XY" "abcY1X"  "Y1X1XdY"

or

gsub("(?=X)|(^[^X]*)(?=Y(?!.*X))", "\\11", test.dat, perl=TRUE)
#[1] "abcde"   "abc1Xe"  "abcd1Y"  "abc1XY"  "abYc1XY" "abcY1X"  "Y1X1XdY"

And to match X or Y whichever comes first:

sub("(?=X)|(?=Y)", "1", test.dat, perl=TRUE)
#sub("(?=X|Y)", "1", test.dat, perl=TRUE) #Alternative
#sub("(?=[XY])", "1", test.dat, perl=TRUE) #Alternative
#[1] "abcde"   "abc1Xe"  "abcd1Y"  "abc1XY"  "ab1YcXY" "abc1YX"  "1YXXdY"

Upvotes: 5

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627100

You can use

test.dat <- c("abcde", "abcXe", "abcdY", "abcXY", "abYcXY", "abcYX")
sub("^([^XY]*)(Y)([^X]*)$|(.*)(X)", "\\1\\41\\3\\5\\2", test.dat)
# => [1] "abcde"   "abc1Xe"  "abcd1Y"  "abc1XY"  "abYc1XY" "abcY1X" 

stringr::str_replace(test.dat, "^([^XY]*)(Y)([^X]*)$|(.*)(X)", "\\1\\41\\3\\5\\2")
# => [1] "abcde"   "abc1Xe"  "abcd1Y"  "abc1XY"  "abYc1XY" "abcY1X" 

See the regex demo.

Here,

  • ^([^XY]*)(Y)([^X]*)$ - start of string (^), Group 1: any zero or more chars other than X and Y (([^XY]*)), Group 2: Y ((Y)), Group 3: any zero or more chars other than X (([^X]*)), end of string ($)
  • | - or
  • (.*) - Group 4: any zero or more chars as many as possible
  • (X) - Group 5: X char.

See the online R demo.

If you need to add 1 to the end of strings not having X or Y:

test.dat <- c("abcde", "abcXe", "abcdY", "abcXY", "abYcXY", "abcYX")
sub("^([^XY]*)$", "\\11", sub("^([^XY]*)(Y)([^X]*)$|(.*)(X)", "\\1\\41\\3\\5\\2", test.dat))
 
library(stringr)
str_replace(str_replace(test.dat, "^([^XY]*)(Y)([^X]*)$|(.*)(X)", "\\1\\41\\3\\5\\2"), "^([^XY]*)$", "\\11")

See this R demo. Output:

[1] "abcde1"  "abc1Xe"  "abcd1Y"  "abc1XY"  "abYc1XY" "abcY1X" 

Upvotes: 2

Related Questions