user2624239
user2624239

Reputation: 55

Flexible functions R

I have written some code to create my own descriptive statistics table since the default summary doesn't do what I want.

Now what I would like is to create a flexible / dynamic function that does this with varying number of variables.

My code looks like this:

N <- c( length(data1), length(data2), length(data3) ) 
mean<- c( mean(data1), mean(data2), mean(data3) )
sd <- c( sd(data1), sd(data2), sd(data3) )
min <- c( min(data1), min(data2), min(data3) )
max <- c( max(data1), max(data2), max(data3) )
print(q) <- data.frame(N, mean, sd, min, max)

So instead of editing this if i want descriptive of something else than 3 variables I would like a function that did something like this;

descriptive <- function(data1, ...) {
  N <- c( length(data1), length(...) ) 
  mean<- c( mean(data1), mean(...) )
  sd <- c( sd(data1), sd(...) )
  min <- c( min(data1), min(...) )
  max <- c( max(data1), max(...) )
  q <- data.frame(N, mean, sd, min, max)
  print(q)
}

I tried the above and hoped it would work, but it only works with two variables. As you might see, I am new to R. I have tried to search for a solution, but I've not been able to find one. But if R is as good as "they" say, I think something like this should be possible.

There's probably a function that already does this, but I would like to be able to do it my self. (: Hope someone can help me!

EDIT!!

Thank you all for your answers, they all seem to work. This shows there are multiple answers to the same question in R. I don't know if you get points for the accepted answer and if this is important, but I choose Arun answers since it comes closed to my aim of creating a descriptive table that is "good looking" and flexible.

If anyone in the future is interested I've add this to Arun answer that makes it fit my purpose perfect;

data <- list(var1, var2 ...)
names <- c"name1", "name2", "...")
descriptive(data)

This solution also seems to have the benefit of variables of different lengths vs data frames.

Upvotes: 3

Views: 225

Answers (3)

Thomas
Thomas

Reputation: 44555

This would be a good opportunity to learn the apply family of functions, so that you can specify your intended output as a function and then apply that to a dataframe.

mydf <- data.frame(x=rnorm(100), y=rnorm(100)) # example data

descriptive <- function(x)
   c(length=length(x), mean=mean(x), sd=sd(x), min=min(x), max=max(x))

sapply(mydf, descriptive) # apply `descriptive` to the df

The output:

                   x             y             z
length  1.000000e+03 1000.00000000 1000.00000000
mean    3.846765e-03   -0.02009427    0.02001385
sd      9.818488e-01    0.97662850    1.01543571
min    -2.905149e+00   -3.25904432   -3.33017918
max     3.235993e+00    2.86892044    3.13183601

One caution with this is that unless you develop a more sophisticated descriptive function, it won't be able to handle NA values in your data and will cause you problems for variables of different classes in the dataframe (e.g., the mean of a character vector is NA).

This is also more efficient than building a function that internally applies to a list of vectors (as Arun suggests) and plyr (from Baptiste: ldply(mydf, each(length, mean, sd, min, max))):

mydf <- data.frame(x=rnorm(1e5),y=rnorm(1e5),z=rnorm(1e5))
microbenchmark(sapply(mydf,thomas), arun(mydf), baptiste(mydf))

Unit: milliseconds
                 expr       min        lq    median        uq      max neval
 sapply(mydf, thomas)  5.693252  6.039458  7.139658  7.953309 43.32675   100
           arun(mydf) 15.805778 18.522889 19.417559 22.016125 57.93630   100
       baptiste(mydf) 10.995073 11.597998 12.666252 13.861521 47.85533   100

Upvotes: 4

Ari B. Friedman
Ari B. Friedman

Reputation: 72759

If you really want to be able to use ...:

test <- list( seq(10), seq(5) )

descriptiveRow <- function(x) {
  res <- c(length(x), mean(x), sd(x), min(x), max(x))
  names(res) <- c("N","Mean","SD","Min","Max")
  res
}

descriptive <- function( ... ) {
  l <- list(...)
  res <- as.data.frame( lapply( l, descriptiveRow ) )
  colnames(res) <- seq(ncol(res))
  res
}

descriptive(test[[1]], test[[2]])

> descriptive(test[[1]], test[[2]])
            1        2
N    10.00000 5.000000
Mean  5.50000 3.000000
SD    3.02765 1.581139
Min   1.00000 1.000000
Max  10.00000 5.000000

Upvotes: 3

Arun
Arun

Reputation: 118839

You can provide a list as input to your function argument and then use sapply on each to get the statistic for each data.

descriptive <- function(ll) {
    N <- sapply(ll, length)
    mean <- sapply(ll, mean)
    sd <- sapply(ll, sd)
    min <- sapply(ll, min)
    max <- sapply(ll, max)
    print(out <- data.frame(N, mean, sd, min, max))
}

descriptive(list(1:5, 6:10))

  N mean       sd min max
1 5    3 1.581139   1   5
2 5    8 1.581139   6  10

Note: This'll work even if your input is a data.frame and you require statistics on all columns of your data.frame (as it's internally a list).

descriptive(data.frame(1:5, 6:10))
      N mean       sd min max
X1.5  5    3 1.581139   1   5
X6.10 5    8 1.581139   6  10

Upvotes: 3

Related Questions