Kathiravan Meeran
Kathiravan Meeran

Reputation: 450

R calculate standard deviation for the columns with same names

I have a quick question

I had a dataframe with many measurements column. I wanted to calculate mean for the columns having same (header)names.. I used the code below (found in stackoverflow)..

How to calculate the mean of those columns in a data frame with the same column name

As a example data...

df <- data.frame(c(1, 2, 3, 4,5),
                 c(2, 3, 4,NA,2),
                 c(3, 4, 5,3,6),
                 c(3, 7, NA,3,6))
names(df) <- c("a", "b", "a", "b")

df <- sapply(split.default(df, names(df)), rowMeans, na.rm = TRUE) 

The result is like this...

a    b
2    2.5
3    5
4    4
3.5  3
5.5  4

This code gave me mean of the columns with same (header)name.

But I want the standard deviation too. I tried replacing rowMeans with rowSds, but it didn't work.

Any idea how to use the same code to calculate standard deviation along with the mean??

Upvotes: 1

Views: 3113

Answers (3)

storaged
storaged

Reputation: 1857

One idea basing on your previous approach is to do the following

sapply(split.default(df, names(df)), function(x) apply(x, 1, sd, na.rm=TRUE))
#              a         b
# [1,] 1.4142136 0.7071068
# [2,] 1.4142136 2.8284271
# [3,] 1.4142136        NA
# [4,] 0.7071068        NA
# [5,] 0.7071068 2.8284271

Keep in mind that NAs are returned because sd shouldn't be evaluated on a sample of size 1.

Upvotes: 3

jchevali
jchevali

Reputation: 181

Here's a user-defined function which could be useful. You may like to check it out:

rowVars

Upvotes: 0

Julian Zucker
Julian Zucker

Reputation: 564

This should work:

df <- data.frame(c(1, 2, 3),
                 c(2, 3, 4),
                 c(3, 4, 5))
names(df) <- c("a", "b", "a")


sapply(split.default(df, names(df)), function(smaller_df) {
  sapply(smaller_df, function(col) c(mean(col), sd(col)))
})

The first sapply works on each data.frame produced by split, each of which will correspond to a set of columns that have the same name. The second sapply applies to each column.

If you wanted to get the mean and standard deviation for all the measurements in a column with the given name combined, instead of as separate samples, you would change the inner sapply to:

sapply(list(unlist(smaller_df)), function(col) c(mean(col), sd(col)))

Upvotes: 1

Related Questions