figurine
figurine

Reputation: 756

Inserting character dynamically into string in R

I'm trying to insert a "+" symbol into the middle of a postcode. The postcodes following a pattern of AA111AA or AA11AA. I want the "+" to be inserted before the final number, so an output of either AA11+1AA or AA1+1AA. I've found a way to do this using stringr, but it feels like there's an easier way to do this that how I'm currently doing it. Below is my code.

pc <- "bt43xx"

pc <- str_c(
      str_sub(pc, start = 1L, end = -4L), 
      "+", 
      str_sub(pc, start = -3L, end = -1L)
      )

pc
[1] "bt4+3xx"

Upvotes: 2

Views: 6306

Answers (3)

G. Grothendieck
G. Grothendieck

Reputation: 269586

Here are some alternatives. All solutions work if pc is a scalar or vector. No packages are needed. Of them (3) seems particularly short and simple.

1) Match everything (.*) up to the last digit (\\d) and then replace that with the first capture (i.e. the match to the part within the first set of parens), a plus and the second capture (i.e. a match to the last digit).

sub("(.*)(\\d)", "\\1+\\2", pc)

2) An alternative which is even shorter is to match a digit followed by a non-digit and replace that with a plus followed by the match:

sub("(\\d\\D)", "+\\1", pc)
## [1] "bt4+3xx"

3) This one is even shorter than (2). It matches the last 3 characters replacing the match with a plus followed by the match:

sub("(...)$", "+\\1", pc)
## [1] "bt4+3xx"

4) This one splits the string into individual characters, inserts a plus in the appropriate position using append and puts the characters back together.

sapply(Map(append, strsplit(pc, ""), after = nchar(pc) - 3, "+"), paste, collapse = "")
## [1] "bt4+3xx"

If pc were known to be a scalar (as is the case in the question) it could be simplified to:

paste(append(strsplit(pc, "")[[1]], "+", nchar(pc) - 3), collapse = "")
[1] "bt4+3xx"

Upvotes: 5

Shenglin Chen
Shenglin Chen

Reputation: 4554

sub('(\\d)(?=\\D+$)','+\\1',pc,perl=T)

Upvotes: 1

lmo
lmo

Reputation: 38500

This regular expression with sub and two back references should work.

sub("(\\d?)(\\d[^\\d]*)$", "\\1+\\2", pc)
[1] "bt4+3xx"
  • \\d? matches 1 or 0 numeric characters, 0-9, and is captured by (). It will match if at least two numeric characters are present.
  • \\d[^\\d]* matches a numeric character followed by all non numeric characters, and is captured by ()
  • $ anchors the regular expression to the end of the string
  • "\\1+\\2" replaces the matched elements in the first two points with themselves and a "+" in the middle.

Upvotes: 1

Related Questions