RHA
RHA

Reputation: 3872

Convert part of string to upper (or lower) case

I have a vector with sample locations, here's a sample:

test <- c("Aa, Heeswijk T1", "Aa, Heeswijk t1", 
          "Aa, Middelrode t2", "Aa, Middelrode p1",
          "Aa, Heeswijk t1a", "Aa, Heeswijk t3b",
          "Aa, test1 T1", "Aa, test2 t1")

These strings are made out of a location name ("Aa, Heeswijk"), a route code ("T1", "p2", "t3") and sometimes a subroute ("a" or "b"). Unfortunately the route codes (t1, t2, p1, t1a) are sometimes in upper and sometimes in lower case. I want to have all the route codes in UPPER case, leaving the name and subroute unchanged. My expected outcome is:

"Aa, Heeswijk T1", "Aa, Heeswijk T1", 
"Aa, Middelrode T2", "Meander Assendelft P1",
"Aa, Heeswijk T1a", "Aa, Heeswijk T3b"
"Aa, test1 T1", "Aa, test2 T1"

I have looked at toupper() but that changes to whole string. I could also use gsub:

gsub("t1","T1", test)
gsub("t2","T2", test)
#etc.

But there must be a better R-ish way?!
Note: Route codes are always 2 chars long, have a char and an integer and are preceded by a space. So the char to change to upper is always located at the second or third from last.

Upvotes: 3

Views: 3014

Answers (2)

tumultous_rooster
tumultous_rooster

Reputation: 12580

If you'd like to avoid regex (which I wouldn't recommend doing), you can practice some R gymnastics:

df <- data.frame(do.call(rbind, strsplit(test, " ")), stringsAsFactors=FALSE)

Now you have everything split into colums of a dataframe:

> df
   X1         X2  X3
1 Aa,   Heeswijk  T1
2 Aa,   Heeswijk  t1
3 Aa, Middelrode  t2
4 Aa, Middelrode  p1
5 Aa,   Heeswijk t1a
6 Aa,   Heeswijk t3b
7 Aa,      test1  T1
8 Aa,      test2  t1

Next:

df[, 3]  <- paste(toupper(substr(df[, 3], 1, 2)), substr(df[, 3], 3, nchar(df[, 3])), sep="")

will do your uppercasing:

> df
   X1         X2  X3
1 Aa,   Heeswijk  T1
2 Aa,   Heeswijk  T1
3 Aa, Middelrode  T2
4 Aa, Middelrode  P1
5 Aa,   Heeswijk T1a
6 Aa,   Heeswijk T3b
7 Aa,      test1  T1
8 Aa,      test2  T1

Lastly, collapse it all back down:

ans <- apply(df, 1, paste, collapse=" ")
ans

...which gives you:

> ans
[1] "Aa, Heeswijk T1"   "Aa, Heeswijk T1"   "Aa, Middelrode T2" "Aa, Middelrode P1" "Aa, Heeswijk T1a"  "Aa, Heeswijk T3b"  "Aa, test1 T1"     
[8] "Aa, test2 T1"

Upvotes: 1

akrun
akrun

Reputation: 887891

We can use regex lookarounds. We match and capture a word starting with lower case letter followed by regex lookahead number ((?=[0-9])) as a group (using parentheses) and in the replacement we use \\U followed by the capture group to convert it to upper case.

 sub('\\b([a-z])(?=[0-9])', '\\U\\1', test, perl=TRUE)
 #[1] "Aa, Heeswijk T1"       "Aa, Heeswijk T1"       "Aa, Middelrode T2"    
 #[4] "Meander Assendelft P1" "Aa, Heeswijk T1a"      "Aa, Heeswijk T3b"    

Or without using the lookarounds, we can do this with two capture groups.

 sub('\\b([a-z])([0-9])', '\\U\\1\\2', test, perl=TRUE)

Update

Testing with the updated 'test' from the OP's post

sub('\\b([a-z])(?=[0-9])', '\\U\\1', test, perl=TRUE)
#[1] "Aa, Heeswijk T1"   "Aa, Heeswijk T1"   "Aa, Middelrode T2"
#[4] "Aa, Middelrode P1" "Aa, Heeswijk T1a"  "Aa, Heeswijk T3b" 
#[7] "Aa, test1 T1"      "Aa, test2 T1"     

Upvotes: 4

Related Questions