Reputation: 147
I really don't know where to start with this one, so I'm asking here. I have two data frames:
set.seed(21)
DF1 <- data.frame(year = c(seq(2000,2012,by=1)),
C1 = runif(13,0,1),
C2 = runif(13,0,1),
C3 = runif(13,0,1),
C4 = runif(13,0,1),
C5 = runif(13,0,1))
DF2 <- data.frame(column = c("C1", "C2", "C3", "C4", "C5"),
start = c(2005,2001,2006,2005,2009),
end = c(2012,2009,2011,2010,2012))
I need to write a function that has the following steps:
For each row in DF2: Take the corresponding column in DF2$column and take a mean from DF1.
For example: in DF1$C1, take a mean of values between 2005 and 2012
Report: DF2[1,1], DF2[1,2], DF2[1,3], mean 1
Values that are less than the available data, for example: 2002 - 5 = 1997 but are not available in DF1 would be OK as NA.
An example output:
> DF2.out
column start end m1
1 C1 2005 2012 0.9186834
2 C2 2001 2009 NA
Thank you for your help in advance!
Upvotes: 0
Views: 79
Reputation: 6969
You can use mapply
to wrap the loop over DF2
rows:
library(data.table) # using for convenience
DT <- data.table(DF1)
res <- mapply(function(c, start, end) {
res <- DT[year >= start & year <= end, mean(get(c))]
return (res)
} , as.character(DF2$column), DF2$start, DF2$end)
res <- data.frame(res)
res$column <- rownames(res)
res <- merge(DF2, res)
res
# column start end res
#1 C1 2005 2012 0.5861268
#2 C2 2001 2009 0.3942018
#3 C3 2006 2011 0.5853924
#4 C4 2005 2010 0.4904493
#5 C5 2009 2012 0.6783216
Upvotes: 1
Reputation: 93813
Another attempt using mapply
that should be pretty quick as it is just a bit of matrix indexing and selecting:
column <- match(DF2$column, names(DF1) )
start <- match(DF2$start, DF1$year)
end <- match(DF2$end, DF1$year)
m1 <- mapply(
function(r1,r2,co) mean(DF1[cbind(seq(r1,r2), co)]),
start,
end,
column
)
data.frame(
column=names(DF1)[column],
start=DF1$year[start],
end=DF1$year[end],
m1
)
# column start end m1
#1 C1 2005 2012 0.5861268
#2 C2 2001 2009 0.3942018
#3 C3 2006 2011 0.5853924
#4 C4 2005 2010 0.4904493
#5 C5 2009 2012 0.6783216
Upvotes: 0
Reputation: 114
If I've interpreted your question correctly, the following example should give you what you want, if what you want is the mean for each column in DF1 after each column has been subset to the year range in DF2:
# get the column names from DF2$column
c_list <- as.character(DF2$column)
# for each column name in c_list, store the start and end
# year, and find the mean of the column subset by year range
ml <- do.call(rbind, lapply(1:length(c_list), function(x){
start <- DF2[x, "start"]
end <- DF2[x, "end"]
mean(DF1[DF1$year >= start & DF1$year <= end, c_list[x]])
}))
# join the means with DF2
DF2.out <- cbind(DF2, ml)
> DF2.out
column start end ml
1 C1 2005 2012 0.5861268
2 C2 2001 2009 0.3942018
3 C3 2006 2011 0.5853924
4 C4 2005 2010 0.4904493
5 C5 2009 2012 0.6783216
Upvotes: 0
Reputation: 1340
I'm assuming your question is about summarizing one data frame by parameters you have in another data frame. In this case, the code below will help with part 1.
library(dplyr)
apply.by.colname <- function(data, col.name, year.start, year.end) {
data %>%
filter(year >= year.start & year <= year.end) %>%
select(matches(col.name))
}
new.df <- apply.by.colname(DF1, "C1", 2005, 2012)
sapply(new.df, mean)
For the complete solution, you may need to use this function in other custom functions or apply
calls.
Upvotes: 1