Yvan
Yvan

Reputation: 101

dplyr mutate on range of columns

I would like to make an operation on a range of columns on a data frame. Let us say this dataset is:

set.seed(15)
df <- data.frame(id=letters[1:10], matrix(runif(5*10), nrow=5)) %>% dplyr::rename(O6 = X7)

I think that it is quite obvious what I want to achieve with:

df %>% rowwise() %>% mutate(minval_X3_X8 = min(X3:X8))

ie to get the result of:

df %>% rowwise() %>% mutate(minval_X3_X8 = min(X3, X4, X5, O6, X7, X8))

My initial try with X3:X8 doesn't give any error message, so I am wondering:
1. What is the best way to achieve the desired output using dplyr (that I get the min/max/average etc. value from columns X3 to X8)
2. What am I actually getting when using X3:X8

Many thanks !

Ps. it would be great if the solution to this includes a possibility to also do:

df %>% rowwise() %>% mutate(minval_all_but_ex_rownames = min(-id))

Upvotes: 3

Views: 6436

Answers (4)

Maurits Evers
Maurits Evers

Reputation: 50668

You can also use purrr::map with transpose

df %>% mutate(minval_X3_X8 = map(transpose(select(., X3:X8)), ~min(as.numeric(.x))))
#id        X1        X2        X3        X4         X5        X6        O6
#1   a 0.6021140 0.9888592 0.1046694 0.8417851 0.80372740 0.6590069 0.4985587
#2   b 0.1950439 0.8151934 0.6461509 0.4474437 0.79334595 0.1069735 0.2567427
#3   c 0.9664587 0.2539684 0.5090904 0.9646670 0.35756312 0.1483839 0.4916694
#4   d 0.6509055 0.6872308 0.7066286 0.1411871 0.05800106 0.9277570 0.1174759
#5   e 0.3670719 0.8314290 0.8623137 0.7767125 0.56574614 0.4763697 0.5128062
#6   f 0.6021140 0.9888592 0.1046694 0.8417851 0.80372740 0.6590069 0.4985587
#7   g 0.1950439 0.8151934 0.6461509 0.4474437 0.79334595 0.1069735 0.2567427
#8   h 0.9664587 0.2539684 0.5090904 0.9646670 0.35756312 0.1483839 0.4916694
#9   i 0.6509055 0.6872308 0.7066286 0.1411871 0.05800106 0.9277570 0.1174759
#10  j 0.3670719 0.8314290 0.8623137 0.7767125 0.56574614 0.4763697 0.5128062
#       X8        X9       X10 minval_X3_X8
#1  0.6578783 0.9152619 0.4291725    0.1046694
#2  0.1215491 0.4574306 0.3302786    0.1069735
#3  0.5159349 0.9210739 0.7528325    0.1483839
#4  0.3016811 0.2591188 0.8438981   0.05800106
#5  0.7603278 0.3437781 0.0108724    0.4763697
#6  0.6578783 0.9152619 0.4291725    0.1046694
#7  0.1215491 0.4574306 0.3302786    0.1069735
#8  0.5159349 0.9210739 0.7528325    0.1483839
#9  0.3016811 0.2591188 0.8438981   0.05800106
#10 0.7603278 0.3437781 0.0108724    0.4763697 

Upvotes: 2

G. Grothendieck
G. Grothendieck

Reputation: 269311

Regarding your questions (1) the code in the question is acting the same as:

df %>% rowwise() %>% mutate(minval_X3_X8 = min(X3))

or just

df %>% rowwise() %>% mutate(minval_X3_X8 = X3)

and for question (2) in (1) below we rework your solution so that it works and following that provide a few other dplyr and base solutions. In the solutions below we have shown the result for min and max. These can readily be extended in the obvious way for additional statistics such as mean, sd, median, etc. by adding arguments to mutate or summarize or extending the aggregate function. Note that the solutions below all work with simple min, max, etc. which makes it easier to extend to additional statistics. Solutions which make use of pmin instead of min may be harder to extend because there may not be a ready counterpart to pmin for every statistic you want.

1) modify code in question To rework your solution we can use do. Within do a dot will refer to the current group, in this case the current row, but it will be a list so convert back to data frame. Note that we used {...} within the do to prevent the dot within {...} from referring to the current row as a list but instead to refer to data.frame(.) .

df %>% 
   rowwise %>% 
   do(as.data.frame(.) %>% { 
      subs <- select(., X3:X8)
      mutate(., Min = subs %>% min,
                Max = subs %>% max) 
      } ) %>%
   ungroup

giving:

# A tibble: 10 x 13
   id       X1    X2    X3    X4     X5    X6    O6    X8    X9    X10    Min   Max
 * <fct> <dbl> <dbl> <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl>  <dbl>  <dbl> <dbl>
 1 a     0.602 0.989 0.105 0.842 0.804  0.659 0.499 0.658 0.915 0.429  0.105  0.842
 2 b     0.195 0.815 0.646 0.447 0.793  0.107 0.257 0.122 0.457 0.330  0.107  0.793
 3 c     0.966 0.254 0.509 0.965 0.358  0.148 0.492 0.516 0.921 0.753  0.148  0.965
 4 d     0.651 0.687 0.707 0.141 0.0580 0.928 0.117 0.302 0.259 0.844  0.0580 0.928
 5 e     0.367 0.831 0.862 0.777 0.566  0.476 0.513 0.760 0.344 0.0109 0.476  0.862
 6 f     0.602 0.989 0.105 0.842 0.804  0.659 0.499 0.658 0.915 0.429  0.105  0.842
 7 g     0.195 0.815 0.646 0.447 0.793  0.107 0.257 0.122 0.457 0.330  0.107  0.793
 8 h     0.966 0.254 0.509 0.965 0.358  0.148 0.492 0.516 0.921 0.753  0.148  0.965
 9 i     0.651 0.687 0.707 0.141 0.0580 0.928 0.117 0.302 0.259 0.844  0.0580 0.928
10 j     0.367 0.831 0.862 0.777 0.566  0.476 0.513 0.760 0.344 0.0109 0.476  0.862

2) apply Using apply:

df %>% {
  Apply <- function(fun) select(., X3:X8) %>% apply(1, fun)
  mutate(., Min = Apply(min), Max = Apply(max))
}

2a) base R or with just base R:

Apply <- function(fun) apply(subset(df, select = X3:X8), 1, fun)
transform(df, Min = Apply(min), Max = Apply(max))

3) dplyr/tidyr Another possibility is dplyr with tidyr reshaping df to long form, performing the calculation in long form and then joining back to df:

library(dplyr)
library(tidyr)

df %>%
   left_join({
     gather(., key, value, -id) %>%
     filter(between(key, "X3", "X8")) %>%
     group_by(id) %>%
     summarize(Min = min(value), Max = max(value)) %>%
     ungroup
   })

3a) Base R (3) can be done in R base R using reshape to create a long form data frame, subset to reduce it to X3:X8 and merge to perform the join.

long <- reshape(df, dir = "long", varying = list(names(df)[-1]), 
  times = names(df)[-1], v.names = "min")
subs <- subset(long, time >= "X3" & time <= "X8")
merge(df, aggregate(min ~ id, subs, function(x) c(Min = min(x), Max = max(x))))

4) dplyr/purrr This one is similar to (2) except we use purrr::pmap_dbl instead of apply.

library(dplyr)
library(purrr)

df %>% {
  Pmap <- function(fun) select(., X3:X8) %>% pmap_dbl(~ fun(c(...)))
  mutate(., Min = Pmap(min), Max = Pmap(max))
}

Updates

Have made some improvements and added additional solutions.

Upvotes: 3

r2evans
r2evans

Reputation: 160397

dplyr::select can use the range-notation of X3:X7, but not other functions. If you can work outside of mutate, try this:

df$minval_X3_X8 <- do.call(pmin, select(df, X3:X8))
df
#    id        X1        X2        X3        X4         X5        O6        X7        X8        X9       X10 minval_X3_X8
# 1   a 0.6021140 0.9888592 0.1046694 0.8417851 0.80372740 0.6590069 0.4985587 0.6578783 0.9152619 0.4291725   0.10466936
# 2   b 0.1950439 0.8151934 0.6461509 0.4474437 0.79334595 0.1069735 0.2567427 0.1215491 0.4574306 0.3302786   0.10697354
# 3   c 0.9664587 0.2539684 0.5090904 0.9646670 0.35756312 0.1483839 0.4916694 0.5159349 0.9210739 0.7528325   0.14838386
# 4   d 0.6509055 0.6872308 0.7066286 0.1411871 0.05800106 0.9277570 0.1174759 0.3016811 0.2591188 0.8438981   0.05800106
# 5   e 0.3670719 0.8314290 0.8623137 0.7767125 0.56574614 0.4763697 0.5128062 0.7603278 0.3437781 0.0108724   0.47636970
# 6   f 0.6021140 0.9888592 0.1046694 0.8417851 0.80372740 0.6590069 0.4985587 0.6578783 0.9152619 0.4291725   0.10466936
# 7   g 0.1950439 0.8151934 0.6461509 0.4474437 0.79334595 0.1069735 0.2567427 0.1215491 0.4574306 0.3302786   0.10697354
# 8   h 0.9664587 0.2539684 0.5090904 0.9646670 0.35756312 0.1483839 0.4916694 0.5159349 0.9210739 0.7528325   0.14838386
# 9   i 0.6509055 0.6872308 0.7066286 0.1411871 0.05800106 0.9277570 0.1174759 0.3016811 0.2591188 0.8438981   0.05800106
# 10  j 0.3670719 0.8314290 0.8623137 0.7767125 0.56574614 0.4763697 0.5128062 0.7603278 0.3437781 0.0108724   0.47636970

Edit: as @Moody_Mudskipper suggested, this can be included in a mutate:

df %>% mutate(minval_X3_X8 = do.call(pmin, select(., X3:X8)))
# same output as above

But this should be used with caution, since its use does not honor grouping of the data. I use this next methodology periodically to see what my functions "see" when called within (optionally-grouped) mutate. The mymin function here does absolutely nothing useful, just provides a mid-mutate browse:

mymin <- function(...) { browser(); 1; }

The example:

df %>% rowwise() %>% mutate(minval_X3_X8 = do.call(mymin, select(., X3:X8)))
# Called from: (function (...) 
# {
#     browser()
#     1
# }) ...snip...
# Browse[1]> 
# debug at #1: [1] 1
# Browse[2]> 

If we look at the function's arguments, we'll see what it has been provided:

list(...)
# $X3
#  [1] 0.1046694 0.6461509 0.5090904 0.7066286 0.8623137 0.1046694 0.6461509 0.5090904 0.7066286
# [10] 0.8623137
# $X4
#  [1] 0.8417851 0.4474437 0.9646670 0.1411871 0.7767125 0.8417851 0.4474437 0.9646670 0.1411871
# [10] 0.7767125
# $X5
#  [1] 0.80372740 0.79334595 0.35756312 0.05800106 0.56574614 0.80372740 0.79334595 0.35756312
#  [9] 0.05800106 0.56574614
# $O6
#  [1] 0.6590069 0.1069735 0.1483839 0.9277570 0.4763697 0.6590069 0.1069735 0.1483839 0.9277570
# [10] 0.4763697
# $X7
#  [1] 0.4985587 0.2567427 0.4916694 0.1174759 0.5128062 0.4985587 0.2567427 0.4916694 0.1174759
# [10] 0.5128062
# $X8
#  [1] 0.6578783 0.1215491 0.5159349 0.3016811 0.7603278 0.6578783 0.1215491 0.5159349 0.3016811
# [10] 0.7603278

Had this honored the rowwise grouping, I would have expected to see something like this, representing just one row of the data:

lapply(list(...), `[`, 1)
# $X3
# [1] 0.1046694
# $X4
# [1] 0.8417851
# $X5
# [1] 0.8037274
# $O6
# [1] 0.6590069
# $X7
# [1] 0.4985587
# $X8
# [1] 0.6578783

Upvotes: 4

akrun
akrun

Reputation: 886928

With tidyverse, we can do

1) With reduce and pmin

library(tidyverse)
df %>%
  select(X3:X8) %>% 
  reduce(pmin) %>% 
  mutate(df, minval_X3_X8 = .)
# id        X1        X2        X3        X4         X5        X6        O6
#1   a 0.6021140 0.9888592 0.1046694 0.8417851 0.80372740 0.6590069 0.4985587
#2   b 0.1950439 0.8151934 0.6461509 0.4474437 0.79334595 0.1069735 0.2567427
#3   c 0.9664587 0.2539684 0.5090904 0.9646670 0.35756312 0.1483839 0.4916694
#4   d 0.6509055 0.6872308 0.7066286 0.1411871 0.05800106 0.9277570 0.1174759
#5   e 0.3670719 0.8314290 0.8623137 0.7767125 0.56574614 0.4763697 0.5128062
#6   f 0.6021140 0.9888592 0.1046694 0.8417851 0.80372740 0.6590069 0.4985587
#7   g 0.1950439 0.8151934 0.6461509 0.4474437 0.79334595 0.1069735 0.2567427
#8   h 0.9664587 0.2539684 0.5090904 0.9646670 0.35756312 0.1483839 0.4916694
#9   i 0.6509055 0.6872308 0.7066286 0.1411871 0.05800106 0.9277570 0.1174759
#10  j 0.3670719 0.8314290 0.8623137 0.7767125 0.56574614 0.4763697 0.5128062
#          X8        X9       X10 minval_X3_X8
#1  0.6578783 0.9152619 0.4291725   0.10466936
#2  0.1215491 0.4574306 0.3302786   0.10697354
#3  0.5159349 0.9210739 0.7528325   0.14838386
#4  0.3016811 0.2591188 0.8438981   0.05800106
#5  0.7603278 0.3437781 0.0108724   0.47636970
#6  0.6578783 0.9152619 0.4291725   0.10466936
#7  0.1215491 0.4574306 0.3302786   0.10697354
#8  0.5159349 0.9210739 0.7528325   0.14838386
#9  0.3016811 0.2591188 0.8438981   0.05800106
#10 0.7603278 0.3437781 0.0108724   0.47636970

2) Or convert the column names to symbols and do an evaluation

df %>% 
   mutate(minval_X3_X8 = pmin(!!! rlang::syms(names(.)[3:8])))

Upvotes: 4

Related Questions