Ute Wollschlaeger
Ute Wollschlaeger

Reputation: 11

How to calculate mean from selected rows with specific index

I need to calculate mean values of selected rows of a data frame with the same index. In detail, I need the mean value of TOC for all LB <= 30 for the same PointID (alltogether 3400 PointIDs). The data frame looks like this:

PointId    LB        TOC 
2          10      21,83 
2          30      20,75 
2          50      11,51 
2          70       6,91 
2         100       1,57 
3          10      20,64 
3          30      15,89 
3          50       5,74 
3          70       5,79 
3         100       5,81 
4          10      19,15 
4          30      15,17 
4          50      22,2 
4          70      10,28
4         100       6,52
.....

Any help is greatly appreciated!

Upvotes: 1

Views: 1509

Answers (3)

s__
s__

Reputation: 9485

Something like this in dplyr:

library(dplyr)
df %>%
  filter(LB <= 30) %>%
  group_by(PointId) %>%
  summarise(mean_ = mean(TOC))

# A tibble: 3 x 2
  PointId mean_
    <int> <dbl>
1       2  21.3
2       3  18.3
3       4  17.2

Or with data.table:

library(data.table)
data.table(df)[LB <= 30, .(mean_ = mean(TOC)), by = PointId]

   PointId  mean_
1:       2 21.290
2:       3 18.265
3:       4 17.160

Or base R:

aggregate(TOC ~ PointId, data = df[df$LB <= 30,], mean)
  PointId    TOC
1       2 21.290
2       3 18.265
3       4 17.160

With data:

structure(list(PointId = c(2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 
3L, 4L, 4L, 4L, 4L, 4L), LB = c(10L, 30L, 50L, 70L, 100L, 10L, 
30L, 50L, 70L, 100L, 10L, 30L, 50L, 70L, 100L), TOC = c(21.83, 
20.75, 11.51, 6.91, 1.57, 20.64, 15.89, 5.74, 5.79, 5.81, 19.15, 
15.17, 22.2, 10.28, 6.52)), class = "data.frame", row.names = c(NA, 
-15L))

Upvotes: 1

dy_by
dy_by

Reputation: 1241

with data.table package is really easy

library(data.table)
library(dplyr)
dt <- dt %>% as.data.table()
dt[LB <= 30, mean(TOC), by = PointId]



   PointId     V1
1:       2 21.290
2:       3 18.265
3:       4 17.160

Upvotes: 0

Jason Mathews
Jason Mathews

Reputation: 805

You can filter first based on LB and then use summarise

df %>% group_by(PointId) %>% 
  filter(LB <= 30) %>% 
  summarise(Mean = mean(TOC, na.rm = TRUE))

# A tibble: 3 x 2
  PointId  Mean
    <int> <dbl>
1       2 2129 
2       3 1826.
3       4 1716 

Upvotes: 0

Related Questions