ArTu
ArTu

Reputation: 483

How to compute a single mean of multiple columns?

I've a database with 4 columns and 8 observations:

enter image description here

> df1
  Rater1 Rater2 Rater4 Rater5
1      3      3      3      3
2      3      3      2      3
3      3      3      2      2
4      0      0      1      0
5      0      0      0      0
6      0      0      0      0
7      0      0      1      0
8      0      0      0      0

I would like to have the mean, median, iqr, sd of all Rater1 and Rater4 observations (16) and all Rater2 and Rater5 observations (16) without creating a new df with 2 variables like this:

> df2
   var1 var2
1     3    3
2     3    3
3     3    3
4     0    0
5     0    0
6     0    0
7     0    0
8     0    0
9     3    3
10    2    3
11    2    2
12    1    0
13    0    0
14    0    0
15    1    0
16    0    0

I would like to obtain this (without a new database, just working on the first database):

> stat.desc(df2)
                   var1       var2
nbr.val      16.0000000 16.0000000
nbr.null      8.0000000 10.0000000
nbr.na        0.0000000  0.0000000
min           0.0000000  0.0000000
max           3.0000000  3.0000000
range         3.0000000  3.0000000
sum          18.0000000 17.0000000
median        0.5000000  0.0000000
mean          1.1250000  1.0625000
SE.mean       0.3275541  0.3590352
CI.mean.0.95  0.6981650  0.7652653
var           1.7166667  2.0625000
std.dev       1.3102163  1.4361407
coef.var      1.1646367  1.3516618

How can I do this in R?

Thank you in advance

Upvotes: 0

Views: 109

Answers (4)

Till
Till

Reputation: 6663

A tidyverse/dplyr solution.

library(dplyr)

bind_rows(select(df, r12 = Rater1, r45 = Rater4),
          select(df, r12 = Rater2, r45 = Rater5)) %>%
  summarise_all(list(
    mean = mean,
    median = median,
    sd = sd,
    iqr = IQR
  ))
#>   r12_mean r45_mean r12_median r45_median r12_sd   r45_sd r12_iqr r45_iqr
#> 1    1.125   1.0625          0        0.5    1.5 1.236595       3       2

In case you want the output similar to the one in your question, use t() to transpose the result.

t(.Last.value)

Upvotes: 0

akrun
akrun

Reputation: 887741

We can loop over the column names that are similar, convert to a vector and get the mean, median, IQR and sd

out <- do.call(rbind, Map(function(x, y) {v1 <- c(df1[[x]], df1[[y]])
          data.frame(Mean = mean(v1), Median = median(v1),
           IQR = IQR(v1), SD = sd(v1))}, names(df1)[1:2], names(df1)[3:4]))



row.names(out) <- paste(names(df1)[1:2], names(df1)[3:4], sep="_")
out
#                Mean Median  IQR       SD
#Rater1_Rater4 1.1250    0.5 2.25 1.310216
#Rater2_Rater5 1.0625    0.0 3.00 1.436141

data

df1 <- structure(list(Rater1 = c(3, 3, 3, 0, 0, 0, 0, 0), Rater2 = c(3, 
3, 3, 0, 0, 0, 0, 0), Rater4 = c(3, 2, 2, 1, 0, 0, 1, 0), Rater5 = c(3, 
3, 2, 0, 0, 0, 0, 0)), class = "data.frame", row.names = c(NA, 
-8L))

Upvotes: 1

Aaron Montgomery
Aaron Montgomery

Reputation: 1397

A possible base approach:

df <- data.frame(                     # construct your original dataframe
  Rater1 = c(3, 3, 3, 0, 0, 0, 0, 0),
  Rater2 = c(3, 3, 3, 0, 0, 0, 0, 0),
  Rater4 = c(3, 2, 2, 1, 0, 0, 1, 0),
  Rater5 = c(3, 3, 2, 0, 0, 0, 0, 0)
)

combined <- data.frame(               # make a new dataframe with your desired variables
  R14 = with(df, c(Rater1, Rater4)),  
  R25 = with(df, c(Rater2, Rater5))  
)

sapply(combined, mean)                # compute mean of each column
sapply(combined, median)              # median
sapply(combined, sd)                  # standard deviation
sapply(combined, IQR)                 # interquartile range

Upvotes: 1

Chris Ruehlemann
Chris Ruehlemann

Reputation: 21442

Another solution, using a for loop to compute the statistics in one go: First, create vectors for the raters you want to combine:

# Raters 2 and 4:
r24 <- as.integer(unlist(df1[,c("Rater2", "Rater4")]))
# Raters 1 and 5:
r15 <- as.integer(unlist(df1[,c("Rater1","Rater5")]))

Combine these vectors in a dataframe:

df <- data.frame(r15, r24)

And calculate the statistics:

for(i in 1:ncol(df)){
  print(c(mean(df[,i]), IQR(df[,i]), median(df[,i]), sd(df[,i])))
}
[1] 1.062500 3.000000 0.000000 1.436141
[1] 1.125000 2.250000 0.500000 1.310216

Upvotes: 1

Related Questions