Reputation: 577
I have a list of strings such as
myvar
[1] "VT" "AK" "AL2" "CA24" "NY12"
[6] "AZ6" "WY4"
I would like to insert the character "0" after the second character in all strings that have 3 characters, and "01" at the end of the string in all strings that have two characters, in order to obtain the output
myvar
[1] "VT01" "AK01" "AL02" "CA24" "NY12"
[6] "AZ06" "WY04"
I thought I could do this in one line using regex lookahead and lookbehind, but I can't get any further than this:
sub('(?<=.{2})(?=.{1})', '0', myvar, perl=T)
myvar
[1] "VT" "AK" "AL002" "CA024" "NY012"
[6] "AZ06" "WY04"
Any help would be much appreciated,
Simone
Upvotes: 3
Views: 3342
Reputation: 174796
You may put the output of sub or gsub command as input to another sub or gsub commands.
myvar <- c("VT", "AK", "AL2", "CA24", "NY12",
"AZ6", "WY4")
sub("^(.{2})$", "\\101", sub("^(.{2})(.)$", "\\10\\2", myvar))
# [1] "VT01" "AK01" "AL02" "CA24" "NY12" "AZ06" "WY04"
Upvotes: 2
Reputation: 24074
another option (could be a one-(long)liner...):
mapply(function(x, dc_x){
if(nchar(x)<4) paste0(dc_x[1], "0", ifelse(length(dc_x)-1, dc_x[2], "1")) else x
},
x=myvar, dc_x=strsplit(myvar, "(?<=^.{2})", perl=T))
# VT AK AL2 CA24 NY12 AZ6 WY4
# "VT01" "AK01" "AL02" "CA24" "NY12" "AZ06" "WY04"
explanation:
dc_x
is a list of vectors, one for each element of myvar
, with the 1st item being the 1st 2 characters of the corresponding item in myvar
. So, for elements of less than 4 characters, you paste the 2 first characters with "01" if there are only 2 characters or with "0" and the rest of the string, if there are more than 2 characters.
Upvotes: 3
Reputation: 15784
On a static cut and paste idea:
paste0(substr(myvar, 0, 2), sub("00", "01", gsub(" ", "0", sprintf("% 2s", substr(myvar, 3, 4)))))
# [1] "VT01" "AK01" "AL02" "CA24" "NY12" "AZ06" "WY04"
Get the 2 last chars with substr, pad them to 2 char, replace the spaces by 0, and then 00 by 01, paste with the 2 first chars and you get the result.
A one liner (without regex as they're not needed and can't be really useful to determine the replacement size at all unless a using a complex choice after on which to replace with what):
myvar[nchar(myvar)<4] <- paste0(myvar[nchar(myvar)<4],sprintf(paste0("%0",4-nchar(myvar[nchar(myvar)<4]),"i"),1))
The goal is to get a vector of 4 chars entries, so for all entries under 4 chars (myvar[nchar(myvar)<4]
) print them along the 0 left padded "1" of length 4 minus the actual entry length.
There's probably a way with with
to avoid the redundant call to myvar[nchar(myvar)<4]
but as I'm not used to it, I'm actually digging.
Upvotes: 5
Reputation: 887601
We can extract the numeric part using sub
, convert the string to numeric
class, change the NA values (from coercion) to 1, and use sprintf
to paste the non-numeric (sub('\\d+', ...)
) and the formatted numeric part.
v1 <- as.numeric(sub('\\D+', '', myvar))
v1[is.na(v1)] <- 1
sprintf('%s%02d', sub('\\d+', '', myvar),v1)
#[1] "VT01" "AK01" "AL02" "CA24" "NY12" "AZ06" "WY04"
Or use gsubfn
. We create an ifelse
condition for those elements that don't have any numeric element and paste with 1. We match the numeric part in gsubfn
(\\d+
), replace it by formatting with sprintf
.
library(gsubfn)
gsubfn('\\d+', ~sprintf('%02d', as.numeric(x)),
ifelse(!grepl('\\d+', myvar), paste0(myvar, 1), myvar))
#[1] "VT01" "AK01" "AL02" "CA24" "NY12" "AZ06" "WY04"
Or a slightly more compact version is using sub
to append 1 to those elements that don't have numeric part
gsubfn('\\d+', ~sprintf('%02d', as.numeric(x)) ,sub('(?<=[A-Z])$', '1', myvar, perl=TRUE))
#[1] "VT01" "AK01" "AL02" "CA24" "NY12" "AZ06" "WY04"
Or to make it more compact without the lookarounds,
gsubfn('\\d+', ~sprintf('%02d', as.numeric(x)), sub('(\\D+)$', '\\11', myvar))
#[1] "VT01" "AK01" "AL02" "CA24" "NY12" "AZ06" "WY04"
Upvotes: 6