IRNotSmart
IRNotSmart

Reputation: 371

Formatting Character strings (First and Last Names) in a long character vector in R

I have many names of people in my character vector:

MLB$Names[1:4] [1] "Derek Jeter" "Robinson Cano" "Nick Markakis" "David Ortiz"

I would like to format them to contain the first inital, with a period, then followed by a space and their last name. I want it to look like the following

MLB$NamesFormatted[1:4] [1] "D. Jeter" "R. Cano" "N. Markakis" "D. Ortiz"

I'm assuming the best way to attack this would be by using grep or sub, but I can't for the life of me figure it out. I'm still a rookie at using R, but I'm loving all of its capabilities!

Any help would be greatly appreciated! Thank you!

Upvotes: 1

Views: 136

Answers (2)

Vincent Bonhomme
Vincent Bonhomme

Reputation: 7443

We can use strsplit and paste:

x <- c("Derek Jeter",   "Robinson Cano",     "Nick Markakis",     "David Ortiz")

sapply(strsplit(x, " "), function(x) paste0(substr(x[1], 1, 1), ". ", x[2]))

[1] "D. Jeter"    "R. Cano"     "N. Markakis" "D. Ortiz" 

We first split name and surname and we obtain a list; we sapply it with an anonymous function that: i) takes the initial of the surname, ii) adds a dot and a space, iii) adds the family name.

Upvotes: 1

akrun
akrun

Reputation: 887213

We can use sub by capturing the first character as a group (^(.)) followed by one or more non-white spaces (\\S+) followed by another capture group of one or more white space succeeded by one or more characters ((\\s+.*)) to the end ($) of the string and replace by the first backreference (\\1) followed by a . followed by second backreference (\\2).

sub("^(.)\\S+(\\s+.*)$", "\\1.\\2", MLB$Names)
#[1] "D. Jeter"    "R. Cano"     "N. Markakis" "D. Ortiz"  

Or it can be done with a compact code of matching one or more lower case letters ([a-z]+) and replace it with ..

sub("[a-z]+", ".", MLB$Names)
#[1] "D. Jeter"    "R. Cano"     "N. Markakis" "D. Ortiz"  

Here is another option with strsplit where we split by one or more lower case letters followed by one or more spaces ([a-z]+\\s+), loop over the list with vapply and paste the strings together.

vapply(strsplit(MLB$Names, "[a-z]+\\s+"), paste, collapse=". ", character(1))
#[1] "D. Jeter"    "R. Cano"     "N. Markakis" "D. Ortiz"   

Data

MLB <- data.frame(Names = c("Derek Jeter", "Robinson Cano", 
              "Nick Markakis", "David Ortiz"), stringsAsFactors=FALSE)

Upvotes: 1

Related Questions