dquestions
dquestions

Reputation: 19

how do I use ifelse to group a range of variables

in my dataset, I have a variable called ref that ranges from 0-7. Each participant has a score. I would like to group it so that 0-3 is 'low' and 4-7 is 'high'.

So I tried to make a new variable and tried to use the ifelse function

control_vs_fast$refsplit <- (ifelse(control_vs_fast$reflection >= 0 & control_vs_fast$reflection <=3, 'low', ifelse(control_vs_slow$reflection >3, 'high', 'no')))

I was wondering if there's a different function I can use so that I don't have to have 'no' as I have no missing values.

Sorry if that was unclear, I'm an R noob :(

EDIT: thanks so much everyone!

Upvotes: 0

Views: 604

Answers (3)

r2evans
r2evans

Reputation: 160437

This is a place that cut works well.

control_vs_fast <- data.frame(reflection = c(-1:5))
control_vs_fast
#   reflection
# 1         -1
# 2          0
# 3          1
# 4          2
# 5          3
# 6          4
# 7          5

By default, cut returns labels using mathematical notation of open/closed ends:

cut(control_vs_fast$reflection, c(-Inf, 0, 3, Inf))
# [1] (-Inf,0] (-Inf,0] (0,3]    (0,3]    (0,3]    (3, Inf] (3, Inf]
# Levels: (-Inf,0] (0,3] (3, Inf]

We can remove labels and go with integers

cut(control_vs_fast$reflection, c(-Inf, 0, 3, Inf), labels = FALSE)
# [1] 1 1 2 2 2 3 3

or define our own labels

cut(control_vs_fast$reflection, c(-Inf, 0, 3, Inf), labels = c("no", "low", "high"))
# [1] no   no   low  low  low  high high
# Levels: no low high
as.character(cut(control_vs_fast$reflection, c(-Inf, 0, 3, Inf), labels = c("no", "low", "high")))
# [1] "no"   "no"   "low"  "low"  "low"  "high" "high"

Note that when labels=FALSE, all returned values are integers, otherwise they are factors. If you need strings (and/or don't know what factors are), then the last one with as.character gives you strings.

Correction

But all of the above are incorrectly marking 0 as "no" instead of "less". To work around this, here's a slightly longer alternative. If you use the integer variant, than simple reassignment works as-is; but if you want strings, then the factors will present a small problem; I'll use the as.character variant here.

control_vs_fast$refsplit <- as.character(cut(control_vs_fast$reflection, c(0, 3, Inf), labels = c("low", "high"), include.lowest = TRUE))
control_vs_fast
#   reflection refsplit
# 1         -1     <NA>
# 2          0      low
# 3          1      low
# 4          2      low
# 5          3      low
# 6          4     high
# 7          5     high
control_vs_fast$refsplit[is.na(control_vs_fast$refsplit)] <- "no"
control_vs_fast
#   reflection refsplit
# 1         -1       no
# 2          0      low
# 3          1      low
# 4          2      low
# 5          3      low
# 6          4     high
# 7          5     high

Explanation:

The problem is that the ranges in cut are either left-open (default) or right-open. The only way to get one of the bins both left-closed and right-closed is to make it the first range/bin and add include.lowest=TRUE. From here, anything less than 0 (if you have that) will be NA, meaning that it was not within one of the assigned bins.

From there, we use indexed-assignment based on those that are NA.

Upvotes: 2

s_baldur
s_baldur

Reputation: 33488

Here is an example with reproducible data:

set.seed(1)
x <- sample(0:7, size = 10, replace = TRUE)

ifelse(x <= 3, 'low', "high")

#  [1] "low"  "low"  "high" "low"  "low"  "high" "high" "low"  "high"
# [10] "low"

Upvotes: 0

ThomasIsCoding
ThomasIsCoding

Reputation: 101307

Perhaps you can try the code below

within(
  control_vs_fast,
  refsplit <- c("high","low")[(reflection <=3)+1]
)

or

within(
  control_vs_fast,
  refsplit <- ifelse(reflection <=3,"low","high")
)

Upvotes: 0

Related Questions