Tammi
Tammi

Reputation: 1

R: Extract first and last names only from a list of names

I am using R for data manipulation. I have a very long list of names that looks like this:

"names"

[1] ""                               
[2] "Victoria Marie"                 
[3] "Ori Mann"                     
[4] "Lina Pearl Right"          
[5] "David Berg"                     
[6] "Anthony Lee"                  
[7] "Brian Michael Ingraham"         
[8] "Jay Ling"             

I want to extract only the first and last names of the whole list into new columns and discard any middle names. How do I do this? I used the following code:

mat  = matrix(unlist(names), ncol=2, byrow=TRUE)

this just runs through all the names in each entry and throws all them into columns in order.

Any help would be greatly appreciated.

Upvotes: 0

Views: 2511

Answers (1)

ulfelder
ulfelder

Reputation: 5335

Here's a way to do this in base R that also deals with the possibility of suffixes. If you discover additional suffixes (e.g., 'II'), you can add them to the vector that follows %in%.

# some representative data
names <- list("", "Ed Smith", "Jennifer Jason Leigh", "Ed Begley, Jr.")

# use strsplit to get a list of vectors of each name broken into its parts,
# keying off the space between names
names.split <- strsplit(unlist(names), " ")

# make new vectors with the first and last names, based on their position in
# those vectors. for last names, make the result conditional on whether or
# not a recognized suffix is in the last spot, and get rid of any 
# punctuation attached to the last name if there was a suffix.
name.first <- sapply(names.split, function(x) x[1])
name.last <- sapply(names.split, function(x)

  # this deals with empty name slots in your original list, returning NA
  if(length(x) == 0) {

    NA

  # now check for a suffix; if one is there, use the penultimate item
  # after stripping it of any punctuation
  } else if (x[length(x)] %in% c("Jr.", "Jr", "Sr.", "Sr")) {

    gsub("[[:punct:]]", "", x[length(x) - 1])

  } else {

    x[length(x)]

})

Results:

> name.first
[1] NA         "Ed"       "Jennifer" "Ed"      
> name.last
[1] NA       "Smith"  "Leigh"  "Begley"

Upvotes: 2

Related Questions