Reputation: 23
I am trying to add a column. I have a column “Y” with values (numeric) going from -50 to 350, I would like to create a new column “Z” evaluating the values creating variables with the conditions from -30 to 30 = “Transition”, 31 to 100 = “Early”, 101 to 200 = “Mid”, 201 to 300 = “Late” everything else “NA” I am trying using the case_when function, within mutate function from dplyr, see code below. But keep getting erorr message. Any help will be very much appreciated
DataSetNew <- DataSet %>%
dplyr::mutate(ColumnZ = case_when(
ColumnY == < = 30 ~ "Transition",
ColumnY == between(31,100) ~ "Early",
ColumnY == between(101,200) ~ "Mid",
ColumnY == between(201,305) ~ "Late",
TRUE ~ "NA"
))
Error: unexpected '<' in:
" dplyr::mutate(ColumnZ = case_when(
ColumnY == <"
Upvotes: 0
Views: 58
Reputation: 6860
You need to declare the column of interest inside the between()
function. In your question you state 201-300 == "Late", but in your code the upper threshold for "late" is 305. This example uses the former.
Also, instead of TRUE ~
for all other values, the most recent advice is to use .default =
instead.
library(dplyr)
# Sample data
DataSet <- data.frame(id = 1:9,
ColumnY = c(-30, 30, 31, 100, 101, 200, 201, 300, 301))
# Return ColumnZ
DataSetNew <- DataSet |>
mutate(ColumnZ = case_when(between(ColumnY, -Inf, 30) ~ "Transition",
between(ColumnY, 31, 100) ~ "Early",
between(ColumnY, 101, 200) ~ "Mid",
between(ColumnY, 201, 300) ~ "Late",
.default = NA))
DataSetNew
# id ColumnY ColumnZ
# 1 1 -30 Transition
# 2 2 30 Transition
# 3 3 31 Early
# 4 4 100 Early
# 5 5 101 Mid
# 6 6 200 Mid
# 7 7 201 Late
# 8 8 300 Late
# 9 9 301 <NA>
This is the equivalent of:
DataSetNew <- DataSet |>
mutate(ColumnZ = case_when(ColumnY <= 30 ~ "Transition",
ColumnY >= 31 & ColumnY <= 100 ~ "Early",
ColumnY >= 101 & ColumnY <= 200 ~ "Mid",
ColumnY >= 201 & ColumnY <= 300 ~ "Late",
.default = NA))
Upvotes: 2
Reputation: 25
it seems there are too many operators behind each other. Check out the dplyr cheat sheet (google -> pdf) to get some idea how to use them.
You could try:
library (dplyr)
DataSetNew <- DataSet %>%
mutate(ColumnZ = case_when(
ColumnY < 30 ~ "Transition",
ColumnY > 31 & ColumnY < 100 ~ "Early",
ColumnY > 101 & ColumnY < 200 == between(101,200) ~ "Mid",
ColumnY > 201 & ColumnY < 305 == between(201,305) ~ "Late")
)
Upvotes: 2