Reputation: 73
I am struggling to figure out how to take a single column of "Name" in a dataframe split it into two other columns of FistName and LastName within the same data frame. The challenge is that some of my Names have several last names. Essentially, I want to take the first word (or element of the string) and put it in the FirstName columns, then put all following text (minus the space of course) into the LastName column.
This is my DataFrame "tteam"
NAME <- c('John Doe','Peter Gynn','Jolie Hope-Douglas', 'Muhammad Arnab Halwai')
TITLE <- c("assistant", "manager", "assistant", "specialist")
tteam<- data.frame(NAME, TITLE)
My desired output would like this:
FirstName <- c("John", "Peter", "Jolie", "Muhammad")
LastName <- c("Doe", "Gynn", "Hope-Douglas", "Arnab Halwai")
tteamdesire <- data.frame(FirstName, LastName, TITLE)
I have tried the following code to create a new data frame of just names that allow me to extract the first names from the first column. However, I am unable to put the last names into any order.
names <- tteam$NAME ## puts full names into names vector
namesdf <- data.frame(do.call('rbind', strsplit(as.character(names),' ',fixed=TRUE)))
## splits out all names into a dataframe PROBLEM IS HERE!
Upvotes: 7
Views: 25881
Reputation: 47340
You could use the package unglue :
library(unglue)
unglue_unnest(tteam, NAME, "{FirstName} {LastName}")
#> TITLE FirstName LastName
#> 1 assistant John Doe
#> 2 manager Peter Gynn
#> 3 assistant Jolie Hope-Douglas
#> 4 specialist Muhammad Arnab Halwai
Upvotes: 2
Reputation: 269905
1) sub
data.frame(FirstName = sub(" .*", "", tteam$NAME),
LastName = sub("^\\S* ", "", tteam$NAME),
tteam[-1])
2) gsubfn::read.pattern In the NAME<-
we can omit as.character
if its already character (as opposed to factor):
library(tteam)
cn <- c("FirstName", "LastName")
NAME <- as.character(tteam$NAME)
cbind( read.pattern(text = NAME, pattern = "^(\\S*) (.*)", col.names = cn), tteam[-1])
Update Update solution to be in terms of tteam
and add second solution.
Upvotes: 3
Reputation: 24593
Try:
> firstname = sapply(strsplit(NAME, ' '), function(x) x[1])
> firstname
[1] "John" "Peter" "Jolie" "Muhammad"
> lastname = sapply(strsplit(NAME, ' '), function(x) x[length(x)])
> lastname
[1] "Doe" "Gynn" "Hope-Douglas" "Halwai"
or:
> ll = strsplit(NAME, ' ')
>
> firstname = sapply(ll, function(x) x[1])
> lastname = sapply(ll, function(x) x[length(x)])
>
> firstname
[1] "John" "Peter" "Jolie" "Muhammad"
> lastname
[1] "Doe" "Gynn" "Hope-Douglas" "Halwai"
Upvotes: 5
Reputation: 887621
You could use extract
from tidyr
library(tidyr)
extract(tteam, NAME, c("FirstName", "LastName"), "([^ ]+) (.*)")
# FirstName LastName TITLE
#1 John Doe assistant
#2 Peter Gynn manager
#3 Jolie Hope-Douglas assistant
#4 Muhammad Arnab Halwai specialist
Upvotes: 8