Alisher Abdullaev
Alisher Abdullaev

Reputation: 21

splitting surnames from fullnames

I've used this:

String <- unlist(str_split(Invname,"[ ]",n=2))

To split the names that I have into Surnames and First Names, since the surnames come first. But I cannot figure out how to reassign the split Invname into two lists, so that I can use only the surnames for the rest of my project. Right now I have this:

" [471] "KRUEGER"                                 "MARCUS"         "

And I would like to have the left side only assigned to a new variable, so that I can work further with mining the surnames for information.

Upvotes: 2

Views: 210

Answers (6)

xxxpipxxx
xxxpipxxx

Reputation: 13

If only names were so straightforward! If there were few complications between strings then yes the answers below are good options. In my experience with name lists we get hyphenated names (both in "first" and "last"), "Middle" names, Titles and shortened name versions (Dr., Mr, Md), and many other variants. I first try to clean the strings before any splitting.

Here is just one idea using dplyr (explicit code provided for clarity)

Invnames <- c("Krueger Markus","Doe John","Tatum Jayson", "Taylor - Cline Jeff", "Davis - Freud Melvin- John")

df <- as.data.frame(Invnames, Invnames = Invnames) %>% 
mutate(Invnames2 = gsub("- ","-",Invnames)) %>% 
mutate(Invnames2 = gsub(" -","-",Invnames2)) %>% 
mutate(surname = gsub(" .*", "", Invnames2)) 

Upvotes: 0

akrun
akrun

Reputation: 887611

With base R, we can make use of read.table/read.csv to separate the string into columns

read.table(text = Invnames, header = FALSE, col.names = c("Surnames", "Firstnames"))
#  Surnames Firstnames
#1  Krueger     Markus
#2      Doe       John
#3    Tatum     Jayson

data

Invnames <- c("Krueger Markus","Doe John","Tatum Jayson")

Upvotes: 1

NColl
NColl

Reputation: 757

Again using data from an earlier answer with dplyr this time

library(tidyverse)

Invnames <- c("Krueger Markus","Doe John","Tatum Jayson")
Invnames <- data.frame(Invnames)

Invnames %>%
  separate(Invnames, c('Surname', 'FirstName'), sep=" ")

 Surname FirstName
1 Krueger    Markus
2     Doe      John
3   Tatum    Jayson

Upvotes: 1

NelsonGon
NelsonGon

Reputation: 13319

Here is some sample data and a suggested solution. Data modified from @Rui Barradas' answer:

Invnames <- c("Krueger.$Markus","Doe.John","Tatum.Jayson")
sapply(strsplit(Invnames,"\\W"),"[")

Upvotes: 1

Rui Barradas
Rui Barradas

Reputation: 76605

Using the data in nate.edwinton's answer, there is no need to unlist.

Invnames <- c("Krueger Markus","Doe John","Tatum Jayson")

String <- stringr::str_split(Invnames, "[ ]", n = 2)
Surnames <- sapply(String, '[', 1)
Firstnames <- sapply(String, '[', 2)
data.frame(Surnames, Firstnames)
#  Surnames Firstnames
#1  Krueger     Markus
#2      Doe       John
#3    Tatum     Jayson

Upvotes: 2

niko
niko

Reputation: 5281

As mentioned in the comments, it would be easier to help if you provided some data. Anyway, here might be a solution:

Assuming that Invnames is a vector of where for every first name there is (exactly) one last name, you could do the following

# data
Invnames <- c("Krueger Markus","Doe John","Tatum Jayson")
# extraction
String <- unlist(stringr::str_split(Invnames,"[ ]",n=2))
# saving first and last names
lastNames <- String[seq(1,length(String),2)]
firstNames <- String[seq(2,length(String),2)]
# yields
> cbind(lastNames,firstNames)
     lastNames firstNames
[1,] "Krueger" "Markus"  
[2,] "Doe"     "John"    
[3,] "Tatum"   "Jayson"  

Upvotes: 1

Related Questions