Reputation: 183
I have a dataframe in the following format (numeric columns with the first row corresponding to some name; data can be missing)-
col1.name | col2.name | col3.name | ... 132 | 12.1 | NA | ... 12.4 | NA | 14.6 | ... 13 | 1441 | 535 | ...
For each column, I want to calculate it's mean, median, and standard deviation and add them to a dataframe in the format-
col.name | mean | median | sd col1.name | 123 | 456 | 12.2 col2.name | 12.1 | 45 | 32.1 col3.name | 111 | 14.6 | 69.2 ... | ... | ... | ...
I currently have the following code; but it gives me an error on 'x' must be numeric. What can I do to do this?
data.frame(ID=hvbp.analysis.df[,1], Means=rowMeans(hvbp.analysis.df[,-1]))
apply(hvbp.analysis.df, 2, mean, na.rm = TRUE)
Upvotes: 0
Views: 89
Reputation: 43334
If you reshape to long form first, e.g. with tidyr::gather
, the rest is pretty typical aggregation:
library(tidyverse)
df <- data.frame(col1.name = c(132, 12.4, 13),
col2.name = c(12.1, NA, 1441),
col3.name = c(NA, 14.6, 535))
df %>%
gather(col.name, value) %>%
group_by(col.name) %>%
summarise(mean = mean(value, na.rm = TRUE),
median = median(value, na.rm = TRUE),
sd = sd(value, na.rm = TRUE))
#> # A tibble: 3 x 4
#> col.name mean median sd
#> <chr> <dbl> <dbl> <dbl>
#> 1 col1.name 52.5 13.0 68.9
#> 2 col2.name 727. 727. 1010.
#> 3 col3.name 275. 275. 368.
summary
and skimr::skim
also provide similar summaries.
Upvotes: 1
Reputation: 47146
With data.frame d
d <- data.frame(a=1:3, b=4:6, c=c(5,5,5))
You can do
t(apply(d, 2, function(i) c(mean=mean(i), median=median(i), sd=sd(i))))
# mean sd sum
#a 2 1 6
#b 5 1 15
#c 5 0 15
If you have NA
s to take care of
t(apply(d, 2, function(i, ...) c(mean=mean(i,...), median=median(i,...), sd=sd(i,...)), na.rm=TRUE))
Upvotes: 0
Reputation: 21709
This works.
df <- data.frame(col1name = c(132, 12.4, 13), col2name = c(12.1,NA,1441), col3name = c(NA,14.6,535))
new_df <- data.frame(col_name = colnames(df))
for(i in c('mean','median','sd'))
{
new_df[[i]] <- apply(t(df),2,eval(i), na.rm=T)
}
print(new_df)
col_name mean median sd
1 col1name 72.05 72.05 84.782103
2 col2name 13.50 13.50 1.555635
3 col3name 663.00 535.00 722.553804
Upvotes: 0
Reputation: 79228
First ensure all your columns are numeric: They might seem to be but maybe they are not. if you do sapply(data,class)
you will get the class for the columns. or do str(data)
. To solve this problem:
data=rapply(data,as.numeric,how="replace")
Now you can apply your codes to the data
Upvotes: 0