rwb
rwb

Reputation: 4478

Check whether value in one dataframe is in another (larger) dataframe

I'm struggling to come up with a vectorised solution to the following problem. I have two dataframes:

> people <- data.frame(name = c('Fred', 'Bob'), profession = c('Builder', 'Baker'))
> people
  name profession
1 Fred    Builder
2  Bob      Baker

> allowed <- data.frame(name = c('Fred', 'Fred', 'Bob', 'Bob'), profession = c('Builder', 'Baker', 'Barman', 'Biker'))
> allowed
  name profession
1 Fred    Builder
2 Fred      Baker
3  Bob     Barman
4  Bob     Biker

That is to say, I want to check every person in people has a permitted profession, and return any names which do not.

For instance, Fred can be a Builder or a Baker, and so he is fine. However, Bob can be a Barman or a Biker, but not a Baker (note: there are only ever two permitted professions in my use case).

I would like to a return a data frame those names which do not have a permitted profession:

name profession permitted
1 Bob Baker Biker
2 Bob Baker Barman

Thanks for the help

Upvotes: 0

Views: 104

Answers (4)

eddi
eddi

Reputation: 49448

Here's a slightly more readable data.table solution. You can do the last step on the same line as well to make it a one-liner, if you consider that readable.

# load library, convert people to a data.table and set a key
library(data.table)
people = data.table(people, key = "name,profession")

# compute
result = data.table(allowed, key = "name")[people[!allowed]]
setnames(result, "profession.1", "permitted")

result
#   name profession permitted
#1:  Bob     Barman     Baker
#2:  Bob      Biker     Baker

Upvotes: 1

SprengMeister
SprengMeister

Reputation: 580

This is my take on it. May need some more testing though.I'd be open to suggestions myself. It works with your example but I am not sure if it would generalize.

people$check <- ifelse(people$profession %in% allowed[which(allowed$name == people$name),"profession"], TRUE,FALSE)

people_select <- people[people$check == TRUE,]

EDIT: and just for clarification in case this is holding you back from voting. The ifelse is vectorized and will run very fast.

Upvotes: 0

Jonas Tundo
Jonas Tundo

Reputation: 6197

Probably there's another way, but this should work. I added a third person with an unpermitted profession to show you how to apply the function to the entire dataset.

currentprof <-structure(list(name = structure(c(2L, 1L, 3L), .Label = c("Bob", 
"Fred", "Jan"), class = "factor"), profession = structure(c(3L, 
2L, 1L), .Label = c("Analyst", "Baker", "Builder"), class = "factor")), .Names = c("name", 
"profession"), class = "data.frame", row.names = c(NA, -3L))

allowed <- structure(list(name = structure(c(2L, 2L, 1L, 1L, 3L, 3L), .Label = c("Bob", 
"Fred", "Jan"), class = "factor"), profession = structure(c(4L, 
1L, 2L, 3L, 6L, 5L), .Label = c("Baker", "Barman", "Biker", "Builder", 
"Driver", "Teacher"), class = "factor")), .Names = c("name", 
"profession"), class = "data.frame", row.names = c(NA, -6L))

checkprof <- function(name){
allowedn <- allowed[allowed$name == name,]
currentprofn <- currentprof[currentprof$name==name,]
if(!currentprofn$profession %in% allowedn$profession)
{result <- merge(currentprofn, allowedn, by = "name", all.x=TRUE)} else
{result <-data.frame(col1=character(),
                 col2=character(), 
                 col3=character(), 
                 stringsAsFactors=FALSE)}
colnames(result) <- c("name","profession","permitted")
return(result)
}


do.call(rbind,lapply(levels(allowed$name),checkprof))

Upvotes: 0

Hong Ooi
Hong Ooi

Reputation: 57686

Simple base-only solution. I'm sure someone can come up with something better.

out <- allowed[!allowed$name %in% merge(people, allowed)$name, ]

This gets you the desired people, along with their permitted professions. If you also want their actual professions:

names(out)[2] <- "permitted"
out <- merge(people, out, all.y=TRUE)

Upvotes: 1

Related Questions