Reputation: 395
I am experienced in Fortran, but quite new in R. In Fortran I am used to nest several do-loops, but I guess there are better methods in R. Some other questions were answered by applying apply
, but I am not sure whether this is the right way for me.
I want to do a bias correction for my model data. I know that packages exist for that, but I'd prefer to code it by myself. I have two data.frames, the first contains my model data:
library(dplyr)
x <- round(runif(34698,0,20), 2)
df_a <- data.frame(date=as.Date(0:34697, origin="2006-01-01"),x)
df_a <- setNames(df_a, c("date","daily"))
df_a <- separate(df_a, date, into = c("year", "month", "day"), sep="-")
The second data frame contains the observed and modeled historical monthly means:
df_b <- data.frame(month=seq(01,12,by=1),obs=seq(1.1,12.1,by=1),model=seq(2.2,13.2,by=1))
df_b$month <- ifelse(nchar(df_b$month)!=2,paste0("0",df_b$month),df_b$month)
With the following code, I correct the data of my first data.frame by using the means of each month of the second data.frame. The code works fine, but I think it's not the R-style of coding it. Especially, I would need even more for-loops because I have several model outputs and for each model I have two different scenarios.
system.time(
for(i in 1:12){
for (j in 1:nrow(df_a)) {
if(df_b$month[i]==df_a$month[j]){
df_a$daily[j] <- df_a$daily[j]+(df_b$obs[i]-df_b$model[i])
}
}
}
)
I would really appreciate anyone how could show me how to "improve" my style of coding in R.
Upvotes: 3
Views: 46
Reputation: 886948
A better option would be to do a left_join
and mutate
to create the new column
library(dplyr)
df_a1 <- df_a %>%
left_join(df_b) %>%
mutate(daily = daily + obs + model)
system.time(df_a %>%
left_join(df_b) %>%
mutate(daily = daily + obs + model))
# user system elapsed
# 0.201 0.011 0.213
Also, as @parfait mentioned in the comments, a base R
version with merge
would be
system.time( within(merge(df_a, df_b, by="month", all.x=TRUE), {
daily <- daily + obs + model}))
# user system elapsed
# 0.260 0.015 0.275
Or with data.table
library(data.table)
system.time(setDT(df_a)[df_b, daily := daily + obs + model, on = .(month)])
# user system elapsed
# 0.198 0.011 0.208
and the OP's for
loop
system.time(
for(i in 1:12){
for (j in 1:nrow(df_a)) {
if(df_b$month[i]==df_a$month[j]){
df_a$daily[j] <- df_a$daily[j]+(df_b$obs[i]-df_b$model[i])
}
}
}
)
# user system elapsed
# 9.661 2.741 12.306
Upvotes: 4