Reputation: 409
I have a simple Q... I have a dataset I need to filter by certain parameters. I was hoping for a solution in R?
Dummy case:
colour age animal
red 10 dog
yellow 5 cat
pink 6 cat
I want to classify this dataset e.g. by:
If colour is 'red' OR 'pink' AND age is <7 AND animal is 'cat' then = category 1. Else category 2.
Output would be:
colour age animal category
red 10 dog 2
yellow 5 cat 2
pink 6 cat 1
Is there a way to manipulate dplyr to achieve this? I'm a clinician not a bioinformatician so go easy!
Upvotes: 0
Views: 57
Reputation: 1165
I like the case_when
function in dplyr to set up more complex selections with mutate.
library(tidyverse)
df <- data.frame(colour = c("red", "yellow", "pink", "red", "pink"),
age = c(10, 5, 6, 12, 10),
animal = c("dog", "cat", "cat", "hamster", "cat"))
df
#> colour age animal
#> 1 red 10 dog
#> 2 yellow 5 cat
#> 3 pink 6 cat
#> 4 red 12 hamster
#> 5 pink 10 cat
df <- mutate(df, category = case_when(
((colour == "red" | colour == "pink") & age < 7 & animal == "cat") ~ 1,
(colour == "yellow" | age != 5 & animal == "dog") ~ 2,
(colour == "pink" | animal == "cat") ~ 3,
(TRUE) ~ 4) )
df
#> colour age animal category
#> 1 red 10 dog 2
#> 2 yellow 5 cat 2
#> 3 pink 6 cat 1
#> 4 red 12 hamster 4
#> 5 pink 10 cat 3
Created on 2021-01-17 by the reprex package (v0.3.0)
Upvotes: 1
Reputation: 388982
You could also manipulate this as :
df$category <- with(df,!(colour %in% c('red', 'pink') & age < 7 & animal == 'cat')) + 1
df
# colour age animal category
#1 red 10 dog 2
#2 yellow 5 cat 2
#3 pink 6 cat 1
And in dplyr
:
df %>%
mutate(category = as.integer(!(colour %in% c('red', 'pink') &
age < 7 & animal == 'cat')) + 1)
Upvotes: 1