Reputation: 375
I have two data frames "Conc" and "Flow".
Flow has a value for every day for a set period, whereas Conc only has a value on certain days over the period of the period.
What I want to be able to do is calculate the mean Flow values for each period in between the Conc values using r.
The following code will generate two example data frames to illustrate the kind of datasets I am working with:
Conc <- data.frame(Date = as.Date(c("2012/01/13", "2012/02/16", "2012/05/02", "2012/07/28",
"2012/11/10")), Conc = c(0.88, 0.55, 0.34, 0.21, 0.98))
Flow <- data.frame(Date = c(seq(as.Date("2012/01/01"), by = "day", length.out = 365)),
Flow = c(sample(seq(from = 0.01, to = 5, by = 0.1), size = 365, replace = TRUE)))
The output data frame would ideally be something like:
Period Mean_Flow
1 2.01
2 1.41
3 3.81
4 0.31
I appreciate the variable time between Conc days makes this tricky. At present the best I have been to come up with is to manually do this in excel but I would really like to find an R solution to save myself having to do this for about 10 different dataset that I have.
Thank you.
Upvotes: 5
Views: 702
Reputation: 4921
The following uses a loop going along all available dates in Conc$Date. Conc$Date is for convenience put in vector A. The variable p signifies the values that should be taken into account. The loop stops with a NaN as the loop surpasses the last given date.
A <- Conc$Date
for(i in 1:length(A))
{p <- which(Flow$Date>A[i] & Flow$Date<A[i+1])
M<-mean(Flow$Flow[p])
print(M)}
Upvotes: 0
Reputation: 92302
Here's a possible approach using data.table
package foverlaps
function:
Create time intervals in both data sets
library(data.table)
Conc <- setDT(Conc)[, `:=`(start = Date, end = c(Date[2:(.N - 1)] - 1, Date[.N], NA))][-.N]
Flow <- setDT(Flow)[, `:=`(start = Date, end = Date)]
Key the Flow
data set in order to use foverlaps
function and run the function
setkey(Flow, start, end)
overlaps <- foverlaps(Conc, Flow, type = "any", which = TRUE)
Create indexes of the overlaps within the Flow
data set and compute the mean by those indexes
Flow[overlaps$yid, Period := overlaps$xid]
na.omit(Flow[, list(Mean_Flow = mean(Flow)), by = Period])
# Period Mean_Flow
# 1: 1 2.189412
# 2: 2 2.263947
# 3: 3 2.762874
# 4: 4 2.349048
Upvotes: 6