Virgilio
Virgilio

Reputation: 23

In R create a new column from a column using the function case_when() with multiple conditional rules

I am trying to add a column. I have a column “Y” with values (numeric) going from -50 to 350, I would like to create a new column “Z” evaluating the values creating variables with the conditions from -30 to 30 = “Transition”, 31 to 100 = “Early”, 101 to 200 = “Mid”, 201 to 300 = “Late” everything else “NA” I am trying using the case_when function, within mutate function from dplyr, see code below. But keep getting erorr message. Any help will be very much appreciated

DataSetNew <- DataSet %>%
dplyr::mutate(ColumnZ = case_when(
ColumnY == < = 30 ~ "Transition",
ColumnY == between(31,100) ~ "Early",
ColumnY == between(101,200) ~ "Mid",
ColumnY == between(201,305) ~ "Late",
TRUE ~ "NA"
))
Error: unexpected '<' in:
"  dplyr::mutate(ColumnZ = case_when(
ColumnY == <"

Upvotes: 0

Views: 58

Answers (2)

L Tyrone
L Tyrone

Reputation: 6860

You need to declare the column of interest inside the between() function. In your question you state 201-300 == "Late", but in your code the upper threshold for "late" is 305. This example uses the former.

Also, instead of TRUE ~ for all other values, the most recent advice is to use .default = instead.

library(dplyr)

# Sample data
DataSet <- data.frame(id = 1:9,
                      ColumnY = c(-30, 30, 31, 100, 101, 200, 201, 300, 301))

# Return ColumnZ
DataSetNew <- DataSet |>
  mutate(ColumnZ = case_when(between(ColumnY, -Inf, 30) ~ "Transition",
                             between(ColumnY, 31, 100) ~ "Early",
                             between(ColumnY, 101, 200) ~ "Mid",
                             between(ColumnY, 201, 300)  ~ "Late",
                             .default = NA))

DataSetNew
#   id ColumnY    ColumnZ
# 1  1     -30 Transition
# 2  2      30 Transition
# 3  3      31      Early
# 4  4     100      Early
# 5  5     101        Mid
# 6  6     200        Mid
# 7  7     201       Late
# 8  8     300       Late
# 9  9     301       <NA>

This is the equivalent of:

DataSetNew <- DataSet |>
  mutate(ColumnZ = case_when(ColumnY <= 30 ~ "Transition",
                             ColumnY >= 31 & ColumnY <= 100 ~ "Early",
                             ColumnY >= 101 & ColumnY <= 200 ~ "Mid",
                             ColumnY >= 201 & ColumnY <= 300  ~ "Late",
                             .default = NA))

Upvotes: 2

Sulfatide
Sulfatide

Reputation: 25

it seems there are too many operators behind each other. Check out the dplyr cheat sheet (google -> pdf) to get some idea how to use them.

You could try:

library (dplyr)
DataSetNew <- DataSet %>%
 mutate(ColumnZ = case_when(
  ColumnY < 30 ~ "Transition",
  ColumnY > 31 & ColumnY < 100 ~ "Early",
  ColumnY > 101 & ColumnY < 200 == between(101,200) ~ "Mid",
  ColumnY > 201 & ColumnY < 305 == between(201,305) ~ "Late")
)

Upvotes: 2

Related Questions