Reputation: 3872
I have a vector with sample locations, here's a sample:
test <- c("Aa, Heeswijk T1", "Aa, Heeswijk t1",
"Aa, Middelrode t2", "Aa, Middelrode p1",
"Aa, Heeswijk t1a", "Aa, Heeswijk t3b",
"Aa, test1 T1", "Aa, test2 t1")
These strings are made out of a location name ("Aa, Heeswijk"), a route code ("T1", "p2", "t3") and sometimes a subroute ("a" or "b"). Unfortunately the route codes (t1, t2, p1, t1a) are sometimes in upper and sometimes in lower case. I want to have all the route codes in UPPER case, leaving the name and subroute unchanged. My expected outcome is:
"Aa, Heeswijk T1", "Aa, Heeswijk T1",
"Aa, Middelrode T2", "Meander Assendelft P1",
"Aa, Heeswijk T1a", "Aa, Heeswijk T3b"
"Aa, test1 T1", "Aa, test2 T1"
I have looked at toupper()
but that changes to whole string. I could also use gsub:
gsub("t1","T1", test)
gsub("t2","T2", test)
#etc.
But there must be a better R-ish way?!
Note: Route codes are always 2 chars long, have a char and an integer and are preceded by a space. So the char to change to upper is always located at the second or third from last.
Upvotes: 3
Views: 3014
Reputation: 12580
If you'd like to avoid regex (which I wouldn't recommend doing), you can practice some R gymnastics:
df <- data.frame(do.call(rbind, strsplit(test, " ")), stringsAsFactors=FALSE)
Now you have everything split into colums of a dataframe:
> df
X1 X2 X3
1 Aa, Heeswijk T1
2 Aa, Heeswijk t1
3 Aa, Middelrode t2
4 Aa, Middelrode p1
5 Aa, Heeswijk t1a
6 Aa, Heeswijk t3b
7 Aa, test1 T1
8 Aa, test2 t1
Next:
df[, 3] <- paste(toupper(substr(df[, 3], 1, 2)), substr(df[, 3], 3, nchar(df[, 3])), sep="")
will do your uppercasing:
> df
X1 X2 X3
1 Aa, Heeswijk T1
2 Aa, Heeswijk T1
3 Aa, Middelrode T2
4 Aa, Middelrode P1
5 Aa, Heeswijk T1a
6 Aa, Heeswijk T3b
7 Aa, test1 T1
8 Aa, test2 T1
Lastly, collapse it all back down:
ans <- apply(df, 1, paste, collapse=" ")
ans
...which gives you:
> ans
[1] "Aa, Heeswijk T1" "Aa, Heeswijk T1" "Aa, Middelrode T2" "Aa, Middelrode P1" "Aa, Heeswijk T1a" "Aa, Heeswijk T3b" "Aa, test1 T1"
[8] "Aa, test2 T1"
Upvotes: 1
Reputation: 887891
We can use regex lookarounds. We match and capture a word starting with lower case letter followed by regex lookahead number ((?=[0-9])
) as a group (using parentheses) and in the replacement we use \\U
followed by the capture group to convert it to upper case.
sub('\\b([a-z])(?=[0-9])', '\\U\\1', test, perl=TRUE)
#[1] "Aa, Heeswijk T1" "Aa, Heeswijk T1" "Aa, Middelrode T2"
#[4] "Meander Assendelft P1" "Aa, Heeswijk T1a" "Aa, Heeswijk T3b"
Or without using the lookarounds, we can do this with two capture groups.
sub('\\b([a-z])([0-9])', '\\U\\1\\2', test, perl=TRUE)
Testing with the updated 'test' from the OP's post
sub('\\b([a-z])(?=[0-9])', '\\U\\1', test, perl=TRUE)
#[1] "Aa, Heeswijk T1" "Aa, Heeswijk T1" "Aa, Middelrode T2"
#[4] "Aa, Middelrode P1" "Aa, Heeswijk T1a" "Aa, Heeswijk T3b"
#[7] "Aa, test1 T1" "Aa, test2 T1"
Upvotes: 4