Reputation: 4478
I'm struggling to come up with a vectorised solution to the following problem. I have two dataframes:
> people <- data.frame(name = c('Fred', 'Bob'), profession = c('Builder', 'Baker'))
> people
name profession
1 Fred Builder
2 Bob Baker
> allowed <- data.frame(name = c('Fred', 'Fred', 'Bob', 'Bob'), profession = c('Builder', 'Baker', 'Barman', 'Biker'))
> allowed
name profession
1 Fred Builder
2 Fred Baker
3 Bob Barman
4 Bob Biker
That is to say, I want to check every person in people has a permitted profession, and return any names which do not.
For instance, Fred can be a Builder or a Baker, and so he is fine. However, Bob can be a Barman or a Biker, but not a Baker (note: there are only ever two permitted professions in my use case).
I would like to a return a data frame those names which do not have a permitted profession:
name profession permitted
1 Bob Baker Biker
2 Bob Baker Barman
Thanks for the help
Upvotes: 0
Views: 104
Reputation: 49448
Here's a slightly more readable data.table
solution. You can do the last step on the same line as well to make it a one-liner, if you consider that readable.
# load library, convert people to a data.table and set a key
library(data.table)
people = data.table(people, key = "name,profession")
# compute
result = data.table(allowed, key = "name")[people[!allowed]]
setnames(result, "profession.1", "permitted")
result
# name profession permitted
#1: Bob Barman Baker
#2: Bob Biker Baker
Upvotes: 1
Reputation: 580
This is my take on it. May need some more testing though.I'd be open to suggestions myself. It works with your example but I am not sure if it would generalize.
people$check <- ifelse(people$profession %in% allowed[which(allowed$name == people$name),"profession"], TRUE,FALSE)
people_select <- people[people$check == TRUE,]
EDIT: and just for clarification in case this is holding you back from voting. The ifelse is vectorized and will run very fast.
Upvotes: 0
Reputation: 6197
Probably there's another way, but this should work. I added a third person with an unpermitted profession to show you how to apply the function to the entire dataset.
currentprof <-structure(list(name = structure(c(2L, 1L, 3L), .Label = c("Bob",
"Fred", "Jan"), class = "factor"), profession = structure(c(3L,
2L, 1L), .Label = c("Analyst", "Baker", "Builder"), class = "factor")), .Names = c("name",
"profession"), class = "data.frame", row.names = c(NA, -3L))
allowed <- structure(list(name = structure(c(2L, 2L, 1L, 1L, 3L, 3L), .Label = c("Bob",
"Fred", "Jan"), class = "factor"), profession = structure(c(4L,
1L, 2L, 3L, 6L, 5L), .Label = c("Baker", "Barman", "Biker", "Builder",
"Driver", "Teacher"), class = "factor")), .Names = c("name",
"profession"), class = "data.frame", row.names = c(NA, -6L))
checkprof <- function(name){
allowedn <- allowed[allowed$name == name,]
currentprofn <- currentprof[currentprof$name==name,]
if(!currentprofn$profession %in% allowedn$profession)
{result <- merge(currentprofn, allowedn, by = "name", all.x=TRUE)} else
{result <-data.frame(col1=character(),
col2=character(),
col3=character(),
stringsAsFactors=FALSE)}
colnames(result) <- c("name","profession","permitted")
return(result)
}
do.call(rbind,lapply(levels(allowed$name),checkprof))
Upvotes: 0
Reputation: 57686
Simple base-only solution. I'm sure someone can come up with something better.
out <- allowed[!allowed$name %in% merge(people, allowed)$name, ]
This gets you the desired people, along with their permitted professions. If you also want their actual professions:
names(out)[2] <- "permitted"
out <- merge(people, out, all.y=TRUE)
Upvotes: 1