Flippo
Flippo

Reputation: 25

Finding max of column by group with condition

I have a data frame like this:

enter image description here

for each gill, I would like to find the maximum time for which the Diameter is different from 0. I have tried to use the function aggregate and the dplyr package but this did not work. A combinaison of for, if and aggregate would probably work but I did not find how to do it.

I'm not sure of the best way to approach this. I'd appreciate any help.

Upvotes: 1

Views: 200

Answers (3)

zackadavis
zackadavis

Reputation: 25

I would use a different approach than the elegant solution that akrun suggested. I know how to use this method to create the column MaxTime that you show in your image.

#This will split your df into a list of data frames for each gill.
list.df <- split(df1, df1$Gill)

Then you can use lapply to find the maximum of Time for each Gill and then make that value a new column called MaxTime.

lapply(list.df, function(x) mutate(x, MaxTime = max(x$Time[x$Diametre != 0])))

Then you can combine these split dataframes back together using bind_rows()

df1 = bind_rows(list.df)

Upvotes: 0

see-king_of_knowledge
see-king_of_knowledge

Reputation: 523

Here how you can use aggregate:

> df<- data.frame(
    Gill = rep(1:11, each = 2),
    diameter = c(0,0,1,0,0,0,73.36, 80.08,1,25.2,53.48,61.21,28.8,28.66,71.2,80.25,44.55,53.50,60.91,0,11,74.22),
    time = 0.16
  )
> df
   Gill diameter time
1     1     0.00 0.16
2     1     0.00 0.16
3     2     1.00 0.16
4     2     0.00 0.16
5     3     0.00 0.16
6     3     0.00 0.16
7     4    73.36 0.16
8     4    80.08 0.16
9     5     1.00 0.16
10    5    25.20 0.16
11    6    53.48 0.16
12    6    61.21 0.16
13    7    28.80 0.16
14    7    28.66 0.16
15    8    71.20 0.16
16    8    80.25 0.16
17    9    44.55 0.16
18    9    53.50 0.16
19   10    60.91 0.16
20   10     0.00 0.16
21   11    11.00 0.16
22   11    74.22 0.16
> # Remove diameter == 0 before aggregate
> dfnew <- df[df$diameter != 0, ]
> aggregate(dfnew$time, list(dfnew$Gill), max )
  Group.1    x
1       2 0.16
2       4 0.16
3       5 0.16
4       6 0.16
5       7 0.16
6       8 0.16
7       9 0.16
8      10 0.16
9      11 0.16

Upvotes: 0

akrun
akrun

Reputation: 887951

After grouping by 'Gill', subset the 'Time' where 'Diametre' is not 0 and get the max (assuming 'Time' is numeric class)

library(dplyr)
df1 %>%
  group_by(Gill) %>%
  summarise(Time = max(Time[Diametre != 0]))

Upvotes: 1

Related Questions