Reputation: 64024
Consider the iris
data:
iris
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
7 4.6 3.4 1.4 0.3 setosa
I want to create a new column based on a comparison of the values in variable Sepal.Length
with a fixed limit / cut-off, e.g. check if the values are larger or smaller than 5:
if Sepal.Length >= 5 assign "UP" else assign "DOWN"
to a new column "Regulation".
What's the way to do that?
Upvotes: 40
Views: 117812
Reputation: 12155
In the interest of updating a possible canonical, the package dplyr
has the function mutate
which lets you create a new column in a data.frame in a vectorized fashion:
library(dplyr)
iris_new <- iris %>%
mutate(Regulation = if_else(Sepal.Length >= 5, 'UP', 'DOWN'))
This makes a new column called Regulation
which consists of either 'UP'
or 'DOWN'
based on applying the condition to the Sepal.Length
column.
The case_when
function (also from dplyr
) provides an easy to read way to chain together multiple conditions:
iris %>%
mutate(Regulation = case_when(Sepal.Length >= 5 ~ 'High',
Sepal.Length >= 4.5 ~ 'Mid',
TRUE ~ 'Low'))
This works just like if_else
except instead of 1 condition with a return value for TRUE and FALSE, each line has condition (left side of ~
) and a return value (right side of ~
) that it returns if TRUE. If false, it moves on to the next condition.
In this case, rows where Sepal.Length >= 5
will return 'High'
, rows where Sepal.Length < 5
(since the first condition had to fail) & Sepal.Length >= 4.5
will return 'Mid'
, and all other rows will return 'Low'
. Since TRUE
is always TRUE
, it is used to provide a default value.
Upvotes: 25
Reputation: 56189
Without ifelse:
iris$Regulation <- c("DOWN", "UP")[ (iris$Sepal.Length >= 5) + 1 ]
Benchmark, about 14x faster than ifelse:
bigX <- runif(10^6, 0, 10)
bench::mark(
x1 = c("DOWN", "UP")[ (bigX >= 5) + 1 ],
x2 = ifelse(bigX >=5, "UP", "DOWN"),
x3 = dplyr::if_else(bigX >= 5, "UP", "DOWN")
)
# # A tibble: 3 x 14
# expression min mean median max `itr/sec` mem_alloc n_gc n_itr total_time result memory
# <chr> <bch:t> <bch:t> <bch:t> <bch:t> <dbl> <bch:byt> <dbl> <int> <bch:tm> <list> <list>
# x1 19.1ms 23.9ms 20.5ms 31.6ms 41.9 22.9MB 9 22 525ms <chr ~ <Rpro~
# x2 278.9ms 280.2ms 280.2ms 281.5ms 3.57 118.3MB 4 2 560ms <chr ~ <Rpro~
# x3 47.8ms 64.2ms 54.1ms 138.8ms 15.6 68.7MB 11 8 514ms <chr ~ <Rpro~
Upvotes: 6
Reputation: 2361
Try
iris$Regulation <- ifelse(iris$Sepal.Length >=5, "UP", "DOWN")
Upvotes: 74