student2017
student2017

Reputation: 3

Calculating quantiles in groups of 20 in large data frame R

I am trying to calculate quantiles in groups of 20 rows for a very large dataset (over 40,000 rows). I would also like for the results to be written in one .csv file.

I am able to calculate what I need using slice notation like this:

    my_data<-read.csv(file.choose(),header=TRUE)

    q1<-my_data[1:20,"Q"]
    q2<-my_data[21:40,"Q"]

    quant1<-quantile(q1,c(0.5,0.75,0.8,0.9,0.95))
    quant2<-quantile(q2,c(0.5,0.75,0.8,0.9,0.95))

    d=data.frame(quant1,quant2)

    write.csv(d,file="q_values.csv")

however, this will get extremely cumbersome and time-consuming to re-write these lines for the entire dataset. I'd like to somehow iterate through the data so that it calculates what I need for every 20 rows, but I cannot seem to understand how to do this. I've read through help files for "for" loops in R, but I'm still at novice level when it comes to coding and would greatly appreciate some guidance on how to do this. Thank you

Upvotes: 0

Views: 160

Answers (1)

Eric Watt
Eric Watt

Reputation: 3240

If you add a column to index by, you can do this pretty easily. Here is an example using data.table.

dat <- data.table(Q = rnorm(40000))
dat[, R := rep(1:(.N/20), each = 20)]
dat[, .(quant_0.5 = quantile(Q, probs = c(0.5)),
        quant_0.75 = quantile(Q, probs = c(0.75)),
        quant_0.8 = quantile(Q, probs = c(0.8)),
        quant_0.9 = quantile(Q, probs = c(0.9)),
        quant_0.95 = quantile(Q, probs = c(0.95))), 
    by = R]

Which gives result:

         R    quant_0.5 quant_0.75 quant_0.8 quant_0.9 quant_0.95
   1:    1 -0.123822327  0.4609870 0.5784939 1.0898441  1.1224632
   2:    2 -0.251293742  0.3701377 0.7802016 1.0747215  1.5514140
   3:    3 -0.070979910  0.4268033 0.5546480 1.4477840  1.5304469
   4:    4  0.177552739  0.8687846 1.0001809 1.3883132  1.5394739
   5:    5  0.515836825  0.9611607 1.1268148 1.3396512  1.5087827
  ---                                                            
1996: 1996  0.566311407  1.0667204 1.3171846 1.5641837  1.8594775
1997: 1997  0.009336622  0.2859035 0.3397875 0.5472635  0.8108932
1998: 1998  0.514867828  1.0330679 1.0534716 1.2528384  2.0933062
1999: 1999  0.247092220  0.7501609 0.8924200 1.1446394  1.4736887
2000: 2000 -0.076496868  0.6816951 0.7430764 0.8362260  1.1001702

Upvotes: 2

Related Questions