fifigoblin
fifigoblin

Reputation: 455

Calculate SE per year in a multi-column df

I have a list of several dataframes that look like this:

Year  tex21.1  tex21.2  tex21.3 
2015  0.2      NA       1
2016  0.3      0.4      0.99
2017  0.5      1.2      0.6
2018  NA       1.5      0.5

I want to write a function that takes in these dataframes and loops through them, calculating (1) a mean of all rows per year, and (2) a standard error of all rows per year. Like this:

calculate_function <- function (x) {
  do something 
}

calcs<-lapply(my_list, calculate_function)

> calcs[[1]] 

  Year  mean     SE
  2015  value    value            
  2016  value    value     
  2017  value    value   
  2018  value    value      

Upvotes: 1

Views: 36

Answers (2)

ThomasIsCoding
ThomasIsCoding

Reputation: 101257

Another base R option but similar to @akrun's answer

calculate_function <- function(x) {
  cbind(
    x[1],
    do.call(
      rbind,
      apply(
        x[-1],
        1,
        function(v) {
          data.frame(
            mean = mean(v, na.rm = TRUE),
            se = sd(v, na.rm = TRUE) / sqrt(sum(!is.na(v)))
          )
        }
      )
    )
  )
}

Upvotes: 1

akrun
akrun

Reputation: 887068

If we need to calculate the mean, se for each row, then use apply with MARGIN = 1. Loop over the list elements with lapply and then do the loop over the rows with apply

calculate_function <- function(x) {
       cbind(x['Year'], t(apply(x[-1], 1, function(u)
                     c(mean = mean(u, na.rm = TRUE),
           se = plotrix::std.error(u,  na.rm = TRUE)))))
    }
lapply(my_list, calculate_function)

-output

#[[1]]
#  Year      mean        se
#1 2015 0.6000000 0.4000000
#2 2016 0.5633333 0.2152776
#3 2017 0.7666667 0.2185813
#4 2018 1.0000000 0.5000000

#[[2]]
#  Year      mean        se
#1 2015 0.6000000 0.4000000
#2 2016 0.5633333 0.2152776
#3 2017 0.7666667 0.2185813
#4 2018 1.0000000 0.5000000

data

my_list <- list(structure(list(Year = 2015:2018, tex21.1 = c(0.2, 0.3, 0.5, 
NA), tex21.2 = c(NA, 0.4, 1.2, 1.5), tex21.3 = c(1, 0.99, 0.6, 
0.5)), class = "data.frame", row.names = c(NA, -4L)), structure(list(
    Year = 2015:2018, tex21.1 = c(0.2, 0.3, 0.5, NA), tex21.2 = c(NA, 
    0.4, 1.2, 1.5), tex21.3 = c(1, 0.99, 0.6, 0.5)), 
    class = "data.frame", row.names = c(NA, 
-4L)))

Upvotes: 1

Related Questions