Reputation: 19
in my dataset, I have a variable called ref that ranges from 0-7. Each participant has a score. I would like to group it so that 0-3 is 'low' and 4-7 is 'high'.
So I tried to make a new variable and tried to use the ifelse function
control_vs_fast$refsplit <- (ifelse(control_vs_fast$reflection >= 0 & control_vs_fast$reflection <=3, 'low', ifelse(control_vs_slow$reflection >3, 'high', 'no')))
I was wondering if there's a different function I can use so that I don't have to have 'no' as I have no missing values.
Sorry if that was unclear, I'm an R noob :(
EDIT: thanks so much everyone!
Upvotes: 0
Views: 604
Reputation: 160437
This is a place that cut
works well.
control_vs_fast <- data.frame(reflection = c(-1:5))
control_vs_fast
# reflection
# 1 -1
# 2 0
# 3 1
# 4 2
# 5 3
# 6 4
# 7 5
By default, cut
returns labels using mathematical notation of open/closed ends:
cut(control_vs_fast$reflection, c(-Inf, 0, 3, Inf))
# [1] (-Inf,0] (-Inf,0] (0,3] (0,3] (0,3] (3, Inf] (3, Inf]
# Levels: (-Inf,0] (0,3] (3, Inf]
We can remove labels and go with integers
cut(control_vs_fast$reflection, c(-Inf, 0, 3, Inf), labels = FALSE)
# [1] 1 1 2 2 2 3 3
or define our own labels
cut(control_vs_fast$reflection, c(-Inf, 0, 3, Inf), labels = c("no", "low", "high"))
# [1] no no low low low high high
# Levels: no low high
as.character(cut(control_vs_fast$reflection, c(-Inf, 0, 3, Inf), labels = c("no", "low", "high")))
# [1] "no" "no" "low" "low" "low" "high" "high"
Note that when labels=FALSE
, all returned values are integers, otherwise they are factor
s. If you need strings (and/or don't know what factor
s are), then the last one with as.character
gives you strings.
But all of the above are incorrectly marking 0
as "no"
instead of "less"
. To work around this, here's a slightly longer alternative. If you use the integer variant, than simple reassignment works as-is; but if you want strings, then the factor
s will present a small problem; I'll use the as.character
variant here.
control_vs_fast$refsplit <- as.character(cut(control_vs_fast$reflection, c(0, 3, Inf), labels = c("low", "high"), include.lowest = TRUE))
control_vs_fast
# reflection refsplit
# 1 -1 <NA>
# 2 0 low
# 3 1 low
# 4 2 low
# 5 3 low
# 6 4 high
# 7 5 high
control_vs_fast$refsplit[is.na(control_vs_fast$refsplit)] <- "no"
control_vs_fast
# reflection refsplit
# 1 -1 no
# 2 0 low
# 3 1 low
# 4 2 low
# 5 3 low
# 6 4 high
# 7 5 high
Explanation:
The problem is that the ranges in cut
are either left-open (default) or right-open. The only way to get one of the bins both left-closed and right-closed is to make it the first range/bin and add include.lowest=TRUE
. From here, anything less than 0 (if you have that) will be NA
, meaning that it was not within one of the assigned bins.
From there, we use indexed-assignment based on those that are NA
.
Upvotes: 2
Reputation: 33488
Here is an example with reproducible data:
set.seed(1)
x <- sample(0:7, size = 10, replace = TRUE)
ifelse(x <= 3, 'low', "high")
# [1] "low" "low" "high" "low" "low" "high" "high" "low" "high"
# [10] "low"
Upvotes: 0
Reputation: 101307
Perhaps you can try the code below
within(
control_vs_fast,
refsplit <- c("high","low")[(reflection <=3)+1]
)
or
within(
control_vs_fast,
refsplit <- ifelse(reflection <=3,"low","high")
)
Upvotes: 0