Reputation: 3126
Is it possible to use regex on the following vector:
u<-c("first1","sec2","thir33","fourth4","fifth25","sixth16",
"seven7","eight8","nine9","ten10","eleven11")
to obtain:
[1] "first.1" "sec.2" "thir3.3" "fourth.4" "fifth2.5" "sixth1.6" "seven.7" "eight.8"
[9] "nine.9" "ten.10" "eleven.11"
This is as close as I've gotten:
gsub("([A-Za-z]*)([1]{0,1})([0-9]$)","\\1\\.\\2\\3",u)
#[1] "first.1" "sec.2" "thir3.3" "fourth.4" "fifth2.5" "sixth.16" "seven.7" "eight.8" "nine.9" "ten.10"
#[11] "eleven.11"
Note the sixth element is incorrect: "sixth.16" should be "sixth1.6".
Upvotes: 0
Views: 526
Reputation: 19454
Using DWin's answer as a jumping off point, you could gain some speed (assuming your real problem tackles a much longer vector) by knowing that elements 1:9, 10:99, 100:999 and so on should all be handled in the same respective way.
So, get some larger data
u<-c("first1","sec2","thir33","fourth4","fifth25","sixth16",
"seven7","eight8","nine9","ten10","eleven11")
u[12:101981]<-NA
set.seed(1)
for(i in 12:101981)u[i]<-paste0(paste(sample(c(LETTERS,1:9),5),collapse=""),i)
lengthu<-length(u)
maxLength<-nchar(lengthu)
theStart<-10^(seq_len(maxLength)-1)
theEnd<-c(theStart[-1]-1,lengthu)
Then use sapply
not over each element in u
, but rather over a sequence of length maxLength
tempans<-sapply(seq_len(maxLength),function(x){
sub(paste0("(^.*)(\\d{",x,"})"),"\\1.\\2",u[theStart[x]:theEnd[x]])
})
tail(unlist(tempans))
# [1] "DWY96.101976" "UWFCO.101977" "UR5L8.101978" "XBQ9V.101979" "48MTI.101980"
# [6] "75LIS.101981"
head(unlist(tempans))
# [1] "first.1" "sec.2" "thir3.3" "fourth.4" "fifth2.5" "sixth1.6"
Upvotes: 1
Reputation: 263332
I don't see an internal regex method that would "know" or have access to the position in a vector, but can certainly pass it in and use its 'as.character' coerced value in a pattern.
sapply(seq_along(u), function(x) sub(
paste("(^.+)(", as.character(x), "$)", sep=""),
"\\1.\\2", u[x]) )
[1] "first.1" "sec.2" "thir3.3" "fourth.4" "fifth2.5" "sixth1.6" "seven.7" "eight.8" "nine.9"
[10] "ten.10" "eleven.11"
Upvotes: 4
Reputation: 13363
This isn't particularly pretty, but you can do it in one step with:
gsub("([A-Za-z]+)(10|11)?(?:(\\d)(\\d))?([0-9]{0,1}?)$","\\1\\3\\.\\2\\4\\5",u)
Alternatively, you can break it up into a few steps. Take the single-digits first, then handle the 2-digit cases separately.
v <- gsub("([A-Za-z]+)(\\d)$","\\1.\\2",u)
v <- gsub("([A-Za-z]+)(10|11)$","\\1.\\2",v)
v <- gsub("([A-Za-z]+\\d)(\\d)$","\\1.\\2",v)
Upvotes: 1