simone
simone

Reputation: 577

Insert character in string conditional on number of characters from beginning and end

I have a list of strings such as

myvar
[1] "VT" "AK" "AL2" "CA24" "NY12" 
[6] "AZ6" "WY4"

I would like to insert the character "0" after the second character in all strings that have 3 characters, and "01" at the end of the string in all strings that have two characters, in order to obtain the output

myvar
[1] "VT01" "AK01" "AL02" "CA24" "NY12" 
[6] "AZ06" "WY04"

I thought I could do this in one line using regex lookahead and lookbehind, but I can't get any further than this:

sub('(?<=.{2})(?=.{1})', '0', myvar, perl=T)

myvar
[1] "VT" "AK" "AL002" "CA024" "NY012" 
[6] "AZ06" "WY04"

Any help would be much appreciated,

Simone

Upvotes: 3

Views: 3342

Answers (4)

Avinash Raj
Avinash Raj

Reputation: 174796

You may put the output of sub or gsub command as input to another sub or gsub commands.

myvar <- c("VT", "AK", "AL2", "CA24", "NY12",
           "AZ6", "WY4")
sub("^(.{2})$", "\\101", sub("^(.{2})(.)$", "\\10\\2", myvar))
# [1] "VT01" "AK01" "AL02" "CA24" "NY12" "AZ06" "WY04"

Upvotes: 2

Cath
Cath

Reputation: 24074

another option (could be a one-(long)liner...):

mapply(function(x, dc_x){
           if(nchar(x)<4) paste0(dc_x[1], "0", ifelse(length(dc_x)-1, dc_x[2], "1")) else x
        }, 
       x=myvar, dc_x=strsplit(myvar,  "(?<=^.{2})", perl=T))
#    VT     AK    AL2   CA24   NY12    AZ6    WY4 
# "VT01" "AK01" "AL02" "CA24" "NY12" "AZ06" "WY04" 

explanation:
dc_x is a list of vectors, one for each element of myvar, with the 1st item being the 1st 2 characters of the corresponding item in myvar. So, for elements of less than 4 characters, you paste the 2 first characters with "01" if there are only 2 characters or with "0" and the rest of the string, if there are more than 2 characters.

Upvotes: 3

Tensibai
Tensibai

Reputation: 15784

On a static cut and paste idea:

paste0(substr(myvar, 0, 2), sub("00", "01", gsub(" ", "0", sprintf("% 2s", substr(myvar, 3, 4)))))

# [1] "VT01" "AK01" "AL02" "CA24" "NY12" "AZ06" "WY04"

Get the 2 last chars with substr, pad them to 2 char, replace the spaces by 0, and then 00 by 01, paste with the 2 first chars and you get the result.


A one liner (without regex as they're not needed and can't be really useful to determine the replacement size at all unless a using a complex choice after on which to replace with what):

myvar[nchar(myvar)<4] <- paste0(myvar[nchar(myvar)<4],sprintf(paste0("%0",4-nchar(myvar[nchar(myvar)<4]),"i"),1))

The goal is to get a vector of 4 chars entries, so for all entries under 4 chars (myvar[nchar(myvar)<4]) print them along the 0 left padded "1" of length 4 minus the actual entry length.

There's probably a way with with to avoid the redundant call to myvar[nchar(myvar)<4] but as I'm not used to it, I'm actually digging.

Upvotes: 5

akrun
akrun

Reputation: 887601

We can extract the numeric part using sub, convert the string to numeric class, change the NA values (from coercion) to 1, and use sprintf to paste the non-numeric (sub('\\d+', ...)) and the formatted numeric part.

 v1 <- as.numeric(sub('\\D+', '', myvar))
 v1[is.na(v1)] <- 1
 sprintf('%s%02d', sub('\\d+', '', myvar),v1)
 #[1] "VT01" "AK01" "AL02" "CA24" "NY12" "AZ06" "WY04"

Or use gsubfn. We create an ifelse condition for those elements that don't have any numeric element and paste with 1. We match the numeric part in gsubfn (\\d+), replace it by formatting with sprintf.

 library(gsubfn)
 gsubfn('\\d+', ~sprintf('%02d', as.numeric(x)),
     ifelse(!grepl('\\d+', myvar), paste0(myvar, 1), myvar))
 #[1] "VT01" "AK01" "AL02" "CA24" "NY12" "AZ06" "WY04"

Or a slightly more compact version is using sub to append 1 to those elements that don't have numeric part

 gsubfn('\\d+', ~sprintf('%02d', as.numeric(x)) ,sub('(?<=[A-Z])$', '1', myvar, perl=TRUE))
 #[1] "VT01" "AK01" "AL02" "CA24" "NY12" "AZ06" "WY04"

Or to make it more compact without the lookarounds,

gsubfn('\\d+', ~sprintf('%02d', as.numeric(x)), sub('(\\D+)$', '\\11', myvar))
#[1] "VT01" "AK01" "AL02" "CA24" "NY12" "AZ06" "WY04"

Upvotes: 6

Related Questions