Reputation: 5897
I am having a lot of difficulty trying to place rows from a dataset into "bins". For example, suppose I have a data frame "df" with "var1" and "var2" :
I want to create a new variable called "var3" that follows this logic (R code):
1) if var1 <5 and var2<5 .... then var3 = "a"
2) if var1 between (5,10) and var2 between (5,10) .... then var3 = "b"
3) if var1 > 10 and and var2>10 .... then var3 = "c"
From a previous question I posted (If statements with multiple ranges (R)), I tried the following logic:
library(dplyr)
df %>%
mutate(var3 = case_when(var1 < 5 & var2 < 5 ~ 'a',
var1 > 5 & var1 < 10 & var2 > 5 & var2 < 10 ~ 'b',
var1 >10 & var2 >10 ~ 'c'))
But when I inspect the df$var3, the logic does not seem to be correct (i.e. some entries for var3 do not have any values. note: the smallest possible value of var1 and va2 is 0).
Can someone please help me?
Thanks
UPDATE:
Sample dataset:
a <- rnorm(50,10,10)
b <- rnorm(50, 2,8)
var1 = abs(a)
var2 = abs(b)
df = data.frame(var1, var2)
Upvotes: 2
Views: 915
Reputation: 26218
try this
library(dplyr)
set.seed(123)
df <- data.frame(var1 = round(runif(100)*20, 0),
var2 = round(runif(100)*20, 0))
df <- df %>% mutate(var3 = ifelse(var1 <= 5 & var2 <= 5, "a", ifelse(var1 <= 10 & var2 <= 10, "b", "c")))
to check
library(ggplot2)
df %>%
ggplot() + geom_point(aes(x=var1, y= var2, color= var3))
Upvotes: 3
Reputation: 6776
If you want to use case_when
,
library(dplyr)
## data make
set.seed(111)
df = data.frame(var1 = abs(rnorm(50,10,10)), var2 = abs(rnorm(50,2,8)))
## core
df <- df %>%
mutate(var3 = case_when(var1 < 5 & var2 < 5 ~ 'a',
var1 < 10 & var2 < 10 ~ 'b',
TRUE ~ 'c'))
## plot to check
with(df, plot(var1, var2, col = c(2:4)[as.numeric(as.factor(var3))], cex = 0.7))
abline(h = c(5, 10), v = c(5, 10), lty = 2)
Upvotes: 5