user11916948
user11916948

Reputation: 954

Calculate area under the curve for time serie data

I want to calculate the area under the curve for the time points for each id and column. Any suggestions? Which R packages to use? Many thanks!

id <- rep(1:3,each=5)
time <- rep(c(10,20,30,40,50),3)
q1 <- sample(100,15, replace=T)
q2 <- sample(100,15, replace=T)
q3 <- sample(100,15, replace=T)

df <- data.frame(id,time,q1,q2,q3)
df

   id time q1 q2 q3
   1   10 38 55 38
   1   20 46 29 88
   1   30 16 28 97
   1   40 37 20 81
   1   50 59 27 42
   2   10 82 81 54
   2   20 45  3 23
   2   30 82 67 59
   2   40 27  3 42
   2   50 45 71 45
   3   10 39  8 29
   3   20 12  6 90
   3   30 92 11  7
   3   40 52  8 37
   3   50 81 57 80

Wanted output, something like this:
    q1   q2   q3
1 area area area
2 area area area
3 area area area

Upvotes: 1

Views: 754

Answers (2)

Peace Wang
Peace Wang

Reputation: 2419

Here I will use the trapz function to calculate the integral.

library(data.table)
library(caTools) # integrate with its trapz function
# data
df <- fread("id time q1 q2 q3
   1   10 38 55 38
   1   20 46 29 88
   1   30 16 28 97
   1   40 37 20 81
   1   50 59 27 42
   2   10 82 81 54
   2   20 45  3 23
   2   30 82 67 59
   2   40 27  3 42
   2   50 45 71 45
   3   10 39  8 29
   3   20 12  6 90
   3   30 92 11  7
   3   40 52  8 37
   3   50 81 57 80")

# calculate the area with `trapz`
df[,lapply(.SD[,2:4], function(y) trapz(time,y)),by=id]
#>    id   q1   q2   q3
#> 1:  1 1475 1180 3060
#> 2:  2 2175 1490 1735
#> 3:  3 2160  575 1885

Created on 2021-06-30 by the reprex package (v2.0.0)

Upvotes: 1

Andy Eggers
Andy Eggers

Reputation: 612

library(tidyverse)

id <- rep(1:3,each=5)
time <- rep(c(10,20,30,40,50),3)
q1 <- sample(100,15, replace=T)
q2 <- sample(100,15, replace=T)
q3 <- sample(100,15, replace=T)

df <- data.frame(id,time,q1,q2,q3)

df %>% 
  arrange(time) %>% 
  pivot_longer(cols = c(q1, q2, q3)) -> longer_df

longer_df %>% 
  ggplot(aes(x = time, y = value, col = factor(id))) + 
  geom_line() + 
  geom_point() + 
  facet_wrap(. ~ name)


longer_df %>% 
  group_by(id, name) %>% 
  mutate(lag_value = lag(value),
         midpoint_value = (value + lag_value)/2) %>% 
  summarize(area = 10*sum(midpoint_value, na.rm = T)) %>% 
  pivot_wider(values_from = area)
#> `summarise()` has grouped output by 'id'. You can override using the `.groups` argument.
#> # A tibble: 3 x 4
#> # Groups:   id [3]
#>      id    q1    q2    q3
#>   <int> <dbl> <dbl> <dbl>
#> 1     1  1960  1980  2075
#> 2     2  1025  2215  2180
#> 3     3  2105  1590  2110

Created on 2021-06-30 by the reprex package (v2.0.0)

Upvotes: 1

Related Questions