Lyndz
Lyndz

Reputation: 423

Average per interval using dplyr in R

I would like to get the average of the different variables in a data frame over a specified intervals.

My data has varying pressure level (pres column in the following data). I would like to bin this in such a way that the interval is 50.

0,50,100,150,200,250,300,350,400,450,500,550,600,650,700,750,800,850,900,950,1000

Then I want to get the average of the variables (say temp) in each interval. Should be NA if no data falls within the interval.

Here's my data

structure(list(pres = c(1009L, 1000L, 850L, 811L, 808L, 683L, 
663L, 630L, 498L, 477L, 471L, 468L, 357L, 351L, 348L, 196L, 142L, 
102L, 100L), hgt = c(5L, 85L, 1508L, 1909L, 1941L, 3350L, 3596L, 
4016L, 5906L, 6243L, 6341L, 6391L, 8446L, 8571L, 8635L, 12601L, 
14588L, 16497L, 16608L), temp = c(28.8, 28.2, 18, 15.2, 15, 9, 
8.2, 6.8, -4.9, -7.1, -7.9, -8.1, -20.5, -20.3, -20.3, -54.1, 
-70.9, -81.3, -80.7), dewpt = c(23.8, 22.2, 15.6, 12.5, 12.5, 
-4, -4.8, -8.2, -18.9, -17.1, -12.8, -11.2, -24.6, -29.3, -28.3, 
-61.1, -80.9, -91.3, -91.7), relh = c(74L, 70L, 86L, 84L, 85L, 
40L, 39L, 33L, 33L, 45L, 68L, 78L, 70L, 44L, 49L, 42L, 22L, 18L, 
15L), mixr = c(18.8, 17.17, 13.29, 11.35, 11.4, 4.18, 4.06, 3.29, 
1.73, 2.11, 3.05, 3.49, 1.47, 0.97, 1.07, 0.05, 0, 0, 0), that = c(301.2, 
301.4, 305, 306.1, 306.2, 314.6, 316.4, 319.5, 327.4, 328.7, 
328.9, 329.3, 339.1, 341, 341.9, 348.9, 353.3, 368.3, 371.6), 
thte = c(356.8, 352.1, 344.9, 340.5, 340.8, 328.2, 329.7, 
330.5, 333.6, 336.2, 339.6, 341.4, 344.6, 344.8, 346, 349.2, 
353.3, 368.3, 371.6), thtv = c(304.6, 304.4, 307.4, 308.2, 
308.3, 315.4, 317.2, 320.1, 327.7, 329.1, 329.5, 330, 339.4, 
341.2, 342.1, 349, 353.3, 368.3, 371.6)), class = "data.frame", row.names = c(NA, 
-19L))

Expected output:

A csv file containing the pressure level (interval of 50) and the average of the other variables.

I'll appreciate any help.

Upvotes: 0

Views: 232

Answers (1)

Ronak Shah
Ronak Shah

Reputation: 389275

You could use cut/findInterval to create groups and then use summarise_all to get mean of all other variables.

library(dplyr)
df1 <- df %>% group_by(pres = cut(pres, seq(0, 1000, 50))) %>% summarise_all(mean)
df1
# A tibble: 11 x 9
#   pres           hgt  temp dewpt  relh  mixr  that  thte  thtv
#   <fct>        <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 (50,100]    16608  -80.7 -91.7  15    0     372.  372.  372.
# 2 (100,150]   15542. -76.1 -86.1  20    0     361.  361.  361.
# 3 (150,200]   12601  -54.1 -61.1  42    0.05  349.  349.  349 
# 4 (300,350]    8635  -20.3 -28.3  49    1.07  342.  346   342.
# 5 (350,400]    8508. -20.4 -27.0  57    1.22  340.  345.  340.
# 6 (450,500]    6220.  -7   -15    56    2.60  329.  338.  329.
# 7 (600,650]    4016    6.8  -8.2  33    3.29  320.  330.  320.
# 8 (650,700]    3473    8.6  -4.4  39.5  4.12  316.  329.  316.
# 9 (800,850]    1786   16.1  13.5  85   12.0   306.  342.  308.
#10 (950,1e+03]    85   28.2  22.2  70   17.2   301.  352.  304.
#11 NA              5   28.8  23.8  74   18.8   301.  357.  305.

df1 can be written as csv

write.csv(df1, "output.csv", row.names = FALSE)

Upvotes: 4

Related Questions