Reputation: 3
I have two data frames
df1
Year Farm 1 Farm 2 Farm 3
2015 1000 2000 1500
2016 500 2000 1000
df 2
Year Month Farm 1 Farm 2 Farm 3
2015 Jan 1 1 3
2015 Feb 1 2 1
2016 Jan 2 2 2
2016 Feb 2 1 2
I want to multiply the annual values in df1 across the respective farms in df2 based on the year so that the output is...
df 3
Year Month Farm 1 Farm 2 Farm 3
2015 Jan 1000 2000 4500
2015 Feb 1000 4000 1500
2016 Jan 1000 4000 2000
2016 Feb 1000 2000 2000
I've got the years properly formatted, but have been struggling to find the solution with group_by in dplyr. Should I be trying a different path?
Upvotes: 0
Views: 184
Reputation: 269674
1) Base R Assuming df1
and df2
shown reproducibly in the Note at the end, merge the data frames giving data frame m
. Then create a new data frame df3
by replacing all but the first two coiumns of d2
with the product of those same columns of df2
and the appropriate columns of m
. No packages are used.
m <- merge(df2, df1, by = 1)
df3 <- replace(df2, -(1:2), df2[-(1:2)] * m[-(1:ncol(df2))] )
giving:
> df3
Year Month Farm1 Farm2 Farm3
1 2015 Jan 1000 2000 4500
2 2015 Feb 1000 4000 1500
3 2016 Jan 1000 4000 2000
4 2016 Feb 1000 2000 2000
2) sqldf If you only have a few farms so that it is feasible to write them each out then:
library(sqldf)
sqldf("select
Year,
b.Month,
a.Farm1 * b.Farm1 Farm1,
a.Farm2 * b.Farm2 Farm2,
a.Farm3 * b.Farm3 Farm3
from df2 b left join df1 a using (Year)")
giving:
Year Month Farm1 Farm2 Farm3
1 2015 Jan 1000 2000 4500
2 2015 Feb 1000 4000 1500
3 2016 Jan 1000 4000 2000
4 2016 Feb 1000 2000 2000
Lines1 <- "
Year Farm1 Farm2 Farm3
2015 1000 2000 1500
2016 500 2000 1000"
Lines2 <- "
Year Month Farm1 Farm2 Farm3
2015 Jan 1 1 3
2015 Feb 1 2 1
2016 Jan 2 2 2
2016 Feb 2 1 2"
df1 <- read.table(text = Lines1, header = TRUE)
df2 <- read.table(text = Lines2, header = TRUE)
Upvotes: 1
Reputation: 887213
Here is an option with join from data.table
. Join the second dataset ('df2') with the first ('df1') on
the 'Year' column, and multiply the .SD
(Subset of data.table based on the columns specified in the .SDcols
) with the corresponding columns in the first data, assign (:=
) the output to update the 'Farm" columns in the second dataset
library(data.table)
nm1 <- grep("Farm", names(df1), value = TRUE)
setDT(df2)[df1, (nm1) := .SD * mget(paste0("i.", names(.SD))),
on = .(Year), .SDcols = nm1]
df2
# Year Month Farm1 Farm2 Farm3
#1: 2015 Jan 1000 2000 4500
#2: 2015 Feb 1000 4000 1500
#3: 2016 Jan 1000 4000 2000
#4: 2016 Feb 1000 2000 2000
Upvotes: 1
Reputation: 164
I would approach this problem by transforming the data frames into long format, joining them, and then doing the calculation. Here is an example:
# Load packages
library(dplyr)
library(tidyr)
# Make-up data
df1 = data.frame(Year = 2008:2018,
Farm1 = runif(n = 11, min = 0, max = 2000),
Farm2 = runif(n = 11, min = 0, max = 2000),
Farm3 = runif(n = 11, min = 0, max = 2000))
df2 = expand.grid(Year = 2008:2018,
Month = month.abb[1:12]) %>%
mutate(Farm1 = runif(n = 132, min = 0, max = 10),
Farm2 = runif(n = 132, min = 0, max = 10),
Farm3 = runif(n = 132, min = 0, max = 10))
# Transform data into long format
df1.long = df1 %>%
gather(key = Farm, value = AnnualValue, Farm1:Farm3)
df2.long = df2 %>%
gather(key = Farm, value = Value, Farm1:Farm3)
# Now left_join on Year and multiply columns
df.comb = left_join(df1.long, df2.long) %>%
mutate(NewValue = Value * AnnualValue)
# Transform back to wide format (if necessary)
df.comb.wide = df.comb %>%
select(-AnnualValue, -Value) %>% # drop values not included in wide format
spread(key = Farm, value = NewValue)
Upvotes: 0