Kara
Kara

Reputation: 135

Create a table for N, Min/Max, SD, Mean, and Median in R

I'm very new to R, so please bear with me on this basic question. I have a dataset, DATA, that I created using the data.table package. I created 200 random numbers between 0 and 1, then did that 10000 times, finally creating a data table for with descriptive statistics for each iteration. My code for it looked like this:

rndm<-runif(200, min=0, max=1)
reps <- data.table(x=runif(200*10000),iter=rep(1:200,each=10000))
DATA <- reps[,list(mean=mean(rndm),median=median(rndm),sd=sd(rndm),min=min(rndm),
max=max(rndm)),by=iter]

The data looks something like this:

    Mean    Median     SD    Min    Max
1   0.521    0.499   0.287  0.010  0.998
2   0.511    0.502   0.290  0.009  0.996
.    ...     ... 

etc.

What I want to do is create a table that finds N, mean, median, standard deviation, minimum, and maximum of the accumulated sample means (not of each column like above). I need the output to look something like this:

   N     Mean   Median    SD    Min    Max
 10000  .502     .499    .280  .002   .999

How can I accomplish this?

Upvotes: 4

Views: 12569

Answers (2)

Peter Fine
Peter Fine

Reputation: 2923

At the moment, you're calculating functions in the list separately for every item different iter. But if you want the aggregate stats, just remove the by clause, and your functions will run once, over the whole of the dataset. Then add an item to give N - making use of the .N variable provided by data.table.

DATA <- reps[, list(N=.N, mean=mean(rndm), median=median(rndm), 
                    sd=sd(rndm), min=min(rndm), max=max(rndm))]

Upvotes: 4

Frank
Frank

Reputation: 66819

You could also define a function. This approach allows you to make the same table for a different variable.

summaryfun <- function(x)list(N=length(x),Mean=mean(x),Median=median(x),SD=sd(x),Min=min(x),Max=max(x))
DATA[,summaryfun(mean)]

Upvotes: 7

Related Questions