R get first letters of double/tripple-barrel surnames in data.frame

Question

I have a dataframe with 2 columns:

> df1
      Surname      Name
1 The Builder       Bob
2 Zeta-Jones Catherine

I want to add a third column "Shortened_Surname" which contains the first letters of all the words in the surname field:

      Surname      Name Shortened_Surname
1 The Builder       Bob                TB
2  Zeta-Jones Catherine                ZJ

Note the "-" in the second name. I have barreled surnames separated by spaces and hyphens.

I have tried:

Step1:

> strsplit(unlist(as.character(df1$Surname))," ")
[[1]]
[1] "The"     "Builder"

[[2]]
[1] "Zeta-Jones"

My research suggests I could possibly use strtrim as a Step 2, but all I have found is a number of ways how not to do it.

Jota · Accepted Answer

You can target the space, hyphen, and beginning of the line with lookarounds. For instance, you any character (.) not preceded by the beginning of the line, a space, or a hyphen should be substituted to "":

with(df, gsub("(?



or

with(df, gsub("(?<=[^ -]).", "", Surname, perl=TRUE))


The second gsub substitutes a blank ("") for any character that is preceded by a character that is not a " " or "-".

R get first letters of double/tripple-barrel surnames in data.frame

Answers (2)

Related Questions