Reputation: 55
I have written some code to create my own descriptive statistics table since the default summary
doesn't do what I want.
Now what I would like is to create a flexible / dynamic function that does this with varying number of variables.
My code looks like this:
N <- c( length(data1), length(data2), length(data3) )
mean<- c( mean(data1), mean(data2), mean(data3) )
sd <- c( sd(data1), sd(data2), sd(data3) )
min <- c( min(data1), min(data2), min(data3) )
max <- c( max(data1), max(data2), max(data3) )
print(q) <- data.frame(N, mean, sd, min, max)
So instead of editing this if i want descriptive of something else than 3 variables I would like a function that did something like this;
descriptive <- function(data1, ...) {
N <- c( length(data1), length(...) )
mean<- c( mean(data1), mean(...) )
sd <- c( sd(data1), sd(...) )
min <- c( min(data1), min(...) )
max <- c( max(data1), max(...) )
q <- data.frame(N, mean, sd, min, max)
print(q)
}
I tried the above and hoped it would work, but it only works with two variables. As you might see, I am new to R. I have tried to search for a solution, but I've not been able to find one. But if R is as good as "they" say, I think something like this should be possible.
There's probably a function that already does this, but I would like to be able to do it my self. (: Hope someone can help me!
EDIT!!
Thank you all for your answers, they all seem to work. This shows there are multiple answers to the same question in R. I don't know if you get points for the accepted answer and if this is important, but I choose Arun answers since it comes closed to my aim of creating a descriptive table that is "good looking" and flexible.
If anyone in the future is interested I've add this to Arun answer that makes it fit my purpose perfect;
data <- list(var1, var2 ...)
names <- c"name1", "name2", "...")
descriptive(data)
This solution also seems to have the benefit of variables of different lengths vs data frames.
Upvotes: 3
Views: 225
Reputation: 44555
This would be a good opportunity to learn the apply
family of functions, so that you can specify your intended output as a function and then apply
that to a dataframe.
mydf <- data.frame(x=rnorm(100), y=rnorm(100)) # example data
descriptive <- function(x)
c(length=length(x), mean=mean(x), sd=sd(x), min=min(x), max=max(x))
sapply(mydf, descriptive) # apply `descriptive` to the df
The output:
x y z
length 1.000000e+03 1000.00000000 1000.00000000
mean 3.846765e-03 -0.02009427 0.02001385
sd 9.818488e-01 0.97662850 1.01543571
min -2.905149e+00 -3.25904432 -3.33017918
max 3.235993e+00 2.86892044 3.13183601
One caution with this is that unless you develop a more sophisticated descriptive
function, it won't be able to handle NA
values in your data and will cause you problems for variables of different classes in the dataframe (e.g., the mean of a character vector is NA
).
This is also more efficient than building a function that internally applies to a list of vectors (as Arun suggests) and plyr (from Baptiste: ldply(mydf, each(length, mean, sd, min, max))
):
mydf <- data.frame(x=rnorm(1e5),y=rnorm(1e5),z=rnorm(1e5))
microbenchmark(sapply(mydf,thomas), arun(mydf), baptiste(mydf))
Unit: milliseconds
expr min lq median uq max neval
sapply(mydf, thomas) 5.693252 6.039458 7.139658 7.953309 43.32675 100
arun(mydf) 15.805778 18.522889 19.417559 22.016125 57.93630 100
baptiste(mydf) 10.995073 11.597998 12.666252 13.861521 47.85533 100
Upvotes: 4
Reputation: 72759
If you really want to be able to use ...
:
test <- list( seq(10), seq(5) )
descriptiveRow <- function(x) {
res <- c(length(x), mean(x), sd(x), min(x), max(x))
names(res) <- c("N","Mean","SD","Min","Max")
res
}
descriptive <- function( ... ) {
l <- list(...)
res <- as.data.frame( lapply( l, descriptiveRow ) )
colnames(res) <- seq(ncol(res))
res
}
descriptive(test[[1]], test[[2]])
> descriptive(test[[1]], test[[2]])
1 2
N 10.00000 5.000000
Mean 5.50000 3.000000
SD 3.02765 1.581139
Min 1.00000 1.000000
Max 10.00000 5.000000
Upvotes: 3
Reputation: 118839
You can provide a list
as input to your function argument and then use sapply
on each to get the statistic for each data.
descriptive <- function(ll) {
N <- sapply(ll, length)
mean <- sapply(ll, mean)
sd <- sapply(ll, sd)
min <- sapply(ll, min)
max <- sapply(ll, max)
print(out <- data.frame(N, mean, sd, min, max))
}
descriptive(list(1:5, 6:10))
N mean sd min max
1 5 3 1.581139 1 5
2 5 8 1.581139 6 10
Note: This'll work even if your input is a data.frame
and you require statistics on all columns of your data.frame (as it's internally a list).
descriptive(data.frame(1:5, 6:10))
N mean sd min max
X1.5 5 3 1.581139 1 5
X6.10 5 8 1.581139 6 10
Upvotes: 3