Reputation: 101
For a data frame, I need to find the minimum and maximum values, by row, for an unknown number of columns from column 2 onwards. Here is an example:
library(tidyverse)
# test data
(test_data <- tibble(id = c(1:9),
x = runif(9),
x2 = runif(9),
x3 = runif(9)))
samples = 100
# This example, which specifies the column names, correctly finds the min and max values by row
(test_1 <- test_data %>%
rowwise() %>%
mutate(min_val = min(x, x2, x3), max_val = max(x, x2, x3)))
# This example does not
(test_2 <- test_data %>%
rowwise() %>%
mutate(min_val = min(x:x3), max_val = max(x:x3)))
What I actually want to do is something like
mutate(min_val = min([,2:samples+1]), max_val = max([,2:samples+1])))
because (1) I wish to retain the id column (for later joining with another data frame), and (2) specifying by column position seems an obvious way of doing this as I am unconcerned with the column name and samples may be large.
Thank you!
Edited example
This (as suggested)
test_data %>%
nest(-id) %>% # nest rest of columns apart from id
mutate(min_val = map(data, min), # get min and max
max_val = map(data, max)) %>%
unnest()
works on the original test data. However, the real-world data has repetitions of id, e.g.
(test_data <- tibble(id = c(1:9, 1:9),
x = runif(18),
x2 = runif(18),
x3 = runif(18)))
and this results in 'Error: All nested columns must have the same number of elements.'.
Upvotes: 4
Views: 2193
Reputation: 16121
A possible tidyverse
solution is to nest
any columns you have, apart from id
, and then use map
to get min
and max
. You won't need to specify any column names:
library(tidyverse)
# test data
(test_data <- tibble(id = c(1:9),
x = runif(9),
x2 = runif(9),
x3 = runif(9)))
samples = 100
test_data %>%
nest(-id) %>% # nest rest of columns apart from id
mutate(min_val = map(data, min), # get min and max
max_val = map(data, max)) %>%
unnest() # unnest columns
# # A tibble: 9 x 6
# id min_val max_val x x2 x3
# <int> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 1 0.0217 0.239 0.130 0.0217 0.239
# 2 2 0.125 0.814 0.625 0.814 0.125
# 3 3 0.281 0.770 0.331 0.770 0.281
# 4 4 0.123 0.868 0.123 0.644 0.868
# 5 5 0.149 0.340 0.149 0.340 0.337
# 6 6 0.496 0.865 0.596 0.865 0.496
# 7 7 0.0766 0.984 0.0766 0.656 0.984
# 8 8 0.272 0.926 0.702 0.926 0.272
# 9 9 0.433 0.912 0.912 0.433 0.590
In the case of having multiple ids you can use this:
test_data %>%
mutate(row_id = row_number()) %>% # create a row identifier
nest(-id, -row_id) %>% # nest rest of columns apart from id and row id
mutate(min_val = map(data, min), # get min and max
max_val = map(data, max)) %>%
unnest() # unnest columns
Upvotes: 3
Reputation: 886938
Here is one option with pmin/pmax
library(tidyverse)
test_data %>%
mutate(min_val = pmin(!!! rlang::syms(names(.)[-1])),
max_val = pmax(!!! rlang::syms(names(.)[-1])))
# A tibble: 9 x 6
# id x x2 x3 min_val max_val
# <int> <dbl> <dbl> <dbl> <dbl> <dbl>
#1 1 0.293 0.255 0.501 0.255 0.501
#2 2 0.225 0.605 0.139 0.139 0.605
#3 3 0.704 0.371 0.0939 0.0939 0.704
#4 4 0.519 0.672 0.552 0.519 0.672
#5 5 0.663 0.673 0.725 0.663 0.725
#6 6 0.920 0.320 0.138 0.138 0.920
#7 7 0.280 0.904 0.223 0.223 0.904
#8 8 0.764 0.198 0.688 0.198 0.764
#9 9 0.802 0.0442 0.0765 0.0442 0.802
set.seed(24)
test_data <- tibble(id = c(1:9),
x = runif(9),
x2 = runif(9),
x3 = runif(9))
Upvotes: 0