Reputation: 103
I have fullname data that I have used strsplit() to get each element of the name.
# Dataframe with a `names` column (complete names)
df <- data.frame(
names =
c("Adam, R, Goldberg, MALS, MBA",
"Adam, R, Goldberg, MEd",
"Adam, S, Metsch, MBA",
"Alan, Haas, MSW",
"Alexandra, Dumas, Rhodes, MA",
"Alexandra, Ruttenberg, PhD, MBA"),
stringsAsFactors=FALSE)
# Add a column with the split names (it is actually a list)
df$splitnames <- strsplit(df$names, ', ')
I also have a list of degrees below
degrees<-c("EdS","DEd","MEd","JD","MS","MA","PhD","MSPH","MSW","MSSA","MBA",
"MALS","Esq","MSEd","MFA","MPA","EdM","BSEd")
I would like to get the intersection for each name and respective degrees.
I'm not sure how to flatten the name list so I can compare the two vectors using intersect. When I tried unlist(df$splitname,recursive=F)
it returned each element separately. Any help is appreciated.
Upvotes: 0
Views: 194
Reputation: 2361
Try
df$intersect <- lapply(X=df$splitname, FUN=intersect, y=degrees)
That will give you a list of the intersection of each element in df$splitname
(e.g. intersect(df$splitname[[1]], degrees)
). If you want it as a vector:
sapply(X=df$intersect, FUN=paste, collapse=', ')
I assume you need it as a vector, since possibly the complete names came from one (for instance, from a dataframe), but strsplit outputs a list.
Does that work? If not, please try to clarify your intention.
Good luck!
Upvotes: 3
Reputation: 121568
For continuity, you can use unlist
:
hh <- unlist(df$splitname)
intersect(hh,degrees)
For example :
ll <- list(c("Adam" , "R" , "Goldberg" ,"MALS" , "MBA "),
c("Adam" , "R" , "Goldberg", "MEd" ))
intersect(hh,degrees)
[1] "MEd"
or equivalent to :
hh[hh %in% degrees]
[1] "MEd"
To get differences you can use
setdiff(hh,degrees)
[1] "Adam" "R" "Goldberg" "MALS" "MBA "
...
Upvotes: 0