Reputation: 450
I have a quick question
I had a dataframe with many measurements column. I wanted to calculate mean for the columns having same (header)names.. I used the code below (found in stackoverflow)..
How to calculate the mean of those columns in a data frame with the same column name
As a example data...
df <- data.frame(c(1, 2, 3, 4,5),
c(2, 3, 4,NA,2),
c(3, 4, 5,3,6),
c(3, 7, NA,3,6))
names(df) <- c("a", "b", "a", "b")
df <- sapply(split.default(df, names(df)), rowMeans, na.rm = TRUE)
The result is like this...
a b
2 2.5
3 5
4 4
3.5 3
5.5 4
This code gave me mean of the columns with same (header)name.
But I want the standard deviation too. I tried replacing rowMeans with rowSds, but it didn't work.
Any idea how to use the same code to calculate standard deviation along with the mean??
Upvotes: 1
Views: 3113
Reputation: 1857
One idea basing on your previous approach is to do the following
sapply(split.default(df, names(df)), function(x) apply(x, 1, sd, na.rm=TRUE))
# a b
# [1,] 1.4142136 0.7071068
# [2,] 1.4142136 2.8284271
# [3,] 1.4142136 NA
# [4,] 0.7071068 NA
# [5,] 0.7071068 2.8284271
Keep in mind that NAs
are returned because sd
shouldn't be evaluated on a sample of size 1
.
Upvotes: 3
Reputation: 181
Here's a user-defined function which could be useful. You may like to check it out:
Upvotes: 0
Reputation: 564
This should work:
df <- data.frame(c(1, 2, 3),
c(2, 3, 4),
c(3, 4, 5))
names(df) <- c("a", "b", "a")
sapply(split.default(df, names(df)), function(smaller_df) {
sapply(smaller_df, function(col) c(mean(col), sd(col)))
})
The first sapply works on each data.frame produced by split, each of which will correspond to a set of columns that have the same name. The second sapply applies to each column.
If you wanted to get the mean and standard deviation for all the measurements in a column with the given name combined, instead of as separate samples, you would change the inner sapply to:
sapply(list(unlist(smaller_df)), function(col) c(mean(col), sd(col)))
Upvotes: 1