ceph
ceph

Reputation: 96

Inserting delimiter before nth uppercase letter in R string

I currently have a dataframe of imported CSV data. It's a list of first and last names, jobs titles, and company name. Each entry is on a separate row. The first and last names, job title, and company name are all capitalized.

Each row is in this format:

First LastTitle, Company

I want to insert a comma delimiter before "Title", so that I can then sort the data into three columns, like the second answer on this quesetion: splitting comma separated mixed text and numeric string with strsplit in R.

Essentially, in this specific case I want to locate the 3rd uppercase letter in each string, and then insert a comma delimiter before it.

This answer shows how to split a string on uppercase letters, but seems to only find the first uppercase letter: Splitting String based on letters case.

Any suggestions are appreciated.

Upvotes: 2

Views: 1536

Answers (3)

Shenglin Chen
Shenglin Chen

Reputation: 4554

Try this:

gsub('([a-z])(?=[A-Z])','\\1,',str,perl=T)
[1] "First Last,Title, Company"

Upvotes: 1

scoa
scoa

Reputation: 19857

You could insert a comma after two patterns of one uppercase-several none uppercase character :

x <- "First LastTitle, Company"

sub("(([A-Z][^A-Z]+){2})(.*)","\\1,\\3",x)
[1] "First Last,Title, Company"

Upvotes: 0

mattdevlin
mattdevlin

Reputation: 1095

Split the string into character vector and then use grep to find the positions of the upper case letters, then take the third position.

str <- "First LastTitle, Company"
tmp_str <- unlist(strsplit(str, ""))
ind <- grep("[A-Z]", tmp_str)[3]
paste0(c(tmp_str[1:(ind-1)], ",", tmp_str[ind:nchar(str)]), collapse="")
#[1] "First Last,Title, Company"

Upvotes: 3

Related Questions