user6450579
user6450579

Reputation:

Group rows in a dataframe, take the max values and compute the groups mean

I have got the following dataframe:

df <- read.table(text = "
    Station    Area  Day       Flow
1     21009 4390.00  1913   725.661
2     24005  178.50  1913    25.540
3     25009 1264.00  1913   318.481
4     27002  758.90  1913   230.556
5     38003  133.90  1913     2.823
70    37006  228.40  9526    68.140
71    37008  190.30  9526    38.175
72    37009   60.70  9526    12.200
73    37018   47.90  9526    19.209
156   38021   42.20  12551    15.330
157   39093  117.60  12551    33.090
158   40004  206.00  12551    49.019
159   41005  180.90  12551    63.574
188   41006   87.80  14329    67.130
189   41011  154.00  14329   125.000
190   41012   93.30  14329    43.640
191   41014  379.00  14329   218.000
192   41015   58.30  14329     5.080", header = TRUE)

I would like to apply the dplyr package (should be the quickest way) in order to group the rows by the column Day, then take the max and the mean value of Area, while keeping the corresponding observations in the other 2 columns.

My output should be:

    Station    Area  Day        Flow  Group_mean
1     21009 4390.00  1913    725.661    1345.06
70    37006  228.40  9526     68.140     131.83 
158   40004  206.00  12551    49.019     136.68
191   41014  379.00  14329   218.000     154.48

Upvotes: 2

Views: 698

Answers (2)

akrun
akrun

Reputation: 887078

We can use dplyr. After grouping by 'Day', we summarise the data by taking the first value of 'Area', value of 'Station', where the 'Flow' is maximum, get the max 'Flow' and also the mean of 'Area'.

library(dplyr)
df %>%
   group_by(Day) %>%
   summarise(Area1 = max(Area), 
             Station = Station[which.max(Flow)], 
             Flow = max(Flow), 
             Group_mean = mean(Area)) %>%
             rename(Area = Area1)
#     Day   Area Station    Flow Group_mean
#   <int>  <dbl>   <int>   <dbl>      <dbl>
#1  1913 4390.0   21009 725.661   1345.060
#2  9526  228.4   37006  68.140    131.825
#3 12551  206.0    41005  63.574    136.675
#4 14329  379.0    41014 218.000    154.480

EDIT: Based on @agenis and @Sotos comments

Upvotes: 4

Chirayu Chamoli
Chirayu Chamoli

Reputation: 2076

You could use sqldf by

library(sqldf)
sqldf("select Station, max(Area), Day, Flow, sum(Area)/count(*) as mean from df group by Day")

Upvotes: 3

Related Questions