Reputation: 1
I am using R for data manipulation. I have a very long list of names that looks like this:
"names"
[1] ""
[2] "Victoria Marie"
[3] "Ori Mann"
[4] "Lina Pearl Right"
[5] "David Berg"
[6] "Anthony Lee"
[7] "Brian Michael Ingraham"
[8] "Jay Ling"
I want to extract only the first and last names of the whole list into new columns and discard any middle names. How do I do this? I used the following code:
mat = matrix(unlist(names), ncol=2, byrow=TRUE)
this just runs through all the names in each entry and throws all them into columns in order.
Any help would be greatly appreciated.
Upvotes: 0
Views: 2511
Reputation: 5335
Here's a way to do this in base R that also deals with the possibility of suffixes. If you discover additional suffixes (e.g., 'II'), you can add them to the vector that follows %in%
.
# some representative data
names <- list("", "Ed Smith", "Jennifer Jason Leigh", "Ed Begley, Jr.")
# use strsplit to get a list of vectors of each name broken into its parts,
# keying off the space between names
names.split <- strsplit(unlist(names), " ")
# make new vectors with the first and last names, based on their position in
# those vectors. for last names, make the result conditional on whether or
# not a recognized suffix is in the last spot, and get rid of any
# punctuation attached to the last name if there was a suffix.
name.first <- sapply(names.split, function(x) x[1])
name.last <- sapply(names.split, function(x)
# this deals with empty name slots in your original list, returning NA
if(length(x) == 0) {
NA
# now check for a suffix; if one is there, use the penultimate item
# after stripping it of any punctuation
} else if (x[length(x)] %in% c("Jr.", "Jr", "Sr.", "Sr")) {
gsub("[[:punct:]]", "", x[length(x) - 1])
} else {
x[length(x)]
})
Results:
> name.first
[1] NA "Ed" "Jennifer" "Ed"
> name.last
[1] NA "Smith" "Leigh" "Begley"
Upvotes: 2