Reputation: 483
I've a database with 4 columns and 8 observations:
> df1
Rater1 Rater2 Rater4 Rater5
1 3 3 3 3
2 3 3 2 3
3 3 3 2 2
4 0 0 1 0
5 0 0 0 0
6 0 0 0 0
7 0 0 1 0
8 0 0 0 0
I would like to have the mean, median, iqr, sd of all Rater1 and Rater4 observations (16) and all Rater2 and Rater5 observations (16) without creating a new df with 2 variables like this:
> df2
var1 var2
1 3 3
2 3 3
3 3 3
4 0 0
5 0 0
6 0 0
7 0 0
8 0 0
9 3 3
10 2 3
11 2 2
12 1 0
13 0 0
14 0 0
15 1 0
16 0 0
I would like to obtain this (without a new database, just working on the first database):
> stat.desc(df2)
var1 var2
nbr.val 16.0000000 16.0000000
nbr.null 8.0000000 10.0000000
nbr.na 0.0000000 0.0000000
min 0.0000000 0.0000000
max 3.0000000 3.0000000
range 3.0000000 3.0000000
sum 18.0000000 17.0000000
median 0.5000000 0.0000000
mean 1.1250000 1.0625000
SE.mean 0.3275541 0.3590352
CI.mean.0.95 0.6981650 0.7652653
var 1.7166667 2.0625000
std.dev 1.3102163 1.4361407
coef.var 1.1646367 1.3516618
How can I do this in R?
Thank you in advance
Upvotes: 0
Views: 109
Reputation: 6663
A tidyverse
/dplyr
solution.
library(dplyr)
bind_rows(select(df, r12 = Rater1, r45 = Rater4),
select(df, r12 = Rater2, r45 = Rater5)) %>%
summarise_all(list(
mean = mean,
median = median,
sd = sd,
iqr = IQR
))
#> r12_mean r45_mean r12_median r45_median r12_sd r45_sd r12_iqr r45_iqr
#> 1 1.125 1.0625 0 0.5 1.5 1.236595 3 2
In case you want the output similar to the one in your question, use t()
to transpose the result.
t(.Last.value)
Upvotes: 0
Reputation: 887741
We can loop over the column names that are similar, convert to a vector
and get the mean
, median
, IQR
and sd
out <- do.call(rbind, Map(function(x, y) {v1 <- c(df1[[x]], df1[[y]])
data.frame(Mean = mean(v1), Median = median(v1),
IQR = IQR(v1), SD = sd(v1))}, names(df1)[1:2], names(df1)[3:4]))
row.names(out) <- paste(names(df1)[1:2], names(df1)[3:4], sep="_")
out
# Mean Median IQR SD
#Rater1_Rater4 1.1250 0.5 2.25 1.310216
#Rater2_Rater5 1.0625 0.0 3.00 1.436141
df1 <- structure(list(Rater1 = c(3, 3, 3, 0, 0, 0, 0, 0), Rater2 = c(3,
3, 3, 0, 0, 0, 0, 0), Rater4 = c(3, 2, 2, 1, 0, 0, 1, 0), Rater5 = c(3,
3, 2, 0, 0, 0, 0, 0)), class = "data.frame", row.names = c(NA,
-8L))
Upvotes: 1
Reputation: 1397
A possible base
approach:
df <- data.frame( # construct your original dataframe
Rater1 = c(3, 3, 3, 0, 0, 0, 0, 0),
Rater2 = c(3, 3, 3, 0, 0, 0, 0, 0),
Rater4 = c(3, 2, 2, 1, 0, 0, 1, 0),
Rater5 = c(3, 3, 2, 0, 0, 0, 0, 0)
)
combined <- data.frame( # make a new dataframe with your desired variables
R14 = with(df, c(Rater1, Rater4)),
R25 = with(df, c(Rater2, Rater5))
)
sapply(combined, mean) # compute mean of each column
sapply(combined, median) # median
sapply(combined, sd) # standard deviation
sapply(combined, IQR) # interquartile range
Upvotes: 1
Reputation: 21442
Another solution, using a for
loop to compute the statistics in one go:
First, create vectors for the raters you want to combine:
# Raters 2 and 4:
r24 <- as.integer(unlist(df1[,c("Rater2", "Rater4")]))
# Raters 1 and 5:
r15 <- as.integer(unlist(df1[,c("Rater1","Rater5")]))
Combine these vectors in a dataframe:
df <- data.frame(r15, r24)
And calculate the statistics:
for(i in 1:ncol(df)){
print(c(mean(df[,i]), IQR(df[,i]), median(df[,i]), sd(df[,i])))
}
[1] 1.062500 3.000000 0.000000 1.436141
[1] 1.125000 2.250000 0.500000 1.310216
Upvotes: 1