EduardoCabria
EduardoCabria

Reputation: 29

R: combine several apply() function in a pipe

I'm working with the universal dataset 'palmerpenguins'. In particular, I wanto to calculate the minimum and maximum of all numeric columns. I'm doing it with pipe functions.

install.packages("palmerpenguins")
library(palmerpenguins)

penguins_raw %>%
      select_if(is.numeric) %>%
      apply(2, min, na.rm = TRUE) %>%
      apply(2, max, na.rm = TRUE) %>%
      as.data.frame() %>%
      rownames_to_column(var = 'col')

But I get this error:

Error in apply(., 2, max, na.rm = TRUE) : dim(X) must have a positive length

My doubt is: Is it possible to combine 2 apply() in the same pipeline ? How can I do this ? Thankyou :)

Upvotes: 0

Views: 658

Answers (1)

Ronak Shah
Ronak Shah

Reputation: 388817

If you break this down step by step it will be easier to understand. The output after getting minimum value in each column is :

library(dplyr)
library(palmerpenguins)

penguins_raw %>%
  select_if(is.numeric) %>%
  apply(2, min, na.rm = TRUE)

# Sample Number  Culmen Length (mm)   Culmen Depth (mm) Flipper Length (mm) 
#          1.00               32.10               13.10              172.00 

#      Body Mass (g)   Delta 15 N (o/oo)   Delta 13 C (o/oo) 
#            2700.00                7.63              -27.02 

Now to this output you are applying the function apply(2, max, na.rm = TRUE) which is not what you want because you want to get max for each column from penguins_raw dataset and not from the above output.


If you are using pipes and dplyr function there are dedicated functions to perform such calculation. In this case you can use across.

penguins_raw %>%
  summarise(across(where(is.numeric), list(min = ~min(., na.rm = TRUE), 
                                           max = ~max(., na.rm = TRUE))))

Or if you are on older version of dplyr use summarise_if as :

penguins_raw %>%
  summarise_if(is.numeric, list(min = ~min(., na.rm = TRUE), 
                                           max = ~max(., na.rm = TRUE)))

To get data into 3-column format we can use pivot_longer.

penguins_raw %>%
  summarise(across(where(is.numeric), list(min = ~min(., na.rm = TRUE), 
                                           max = ~max(., na.rm = TRUE)))) %>%
  pivot_longer(cols = everything(), 
               names_to = c('name', '.value'), 
               names_sep = '_')

#  name                    min    max
#  <chr>                 <dbl>  <dbl>
#1 Sample Number          1     152  
#2 Culmen Length (mm)    32.1    59.6
#3 Culmen Depth (mm)     13.1    21.5
#4 Flipper Length (mm)  172     231  
#5 Body Mass (g)       2700    6300  
#6 Delta 15 N (o/oo)      7.63   10.0
#7 Delta 13 C (o/oo)    -27.0   -23.8

Upvotes: 1

Related Questions