EBDorff
EBDorff

Reputation: 3

Multiplying year based vector with a year & month based matrix in R

I have two data frames

df1 

Year  Farm 1  Farm 2  Farm 3
2015    1000    2000    1500
2016    500     2000    1000

df 2

Year Month  Farm 1 Farm 2 Farm 3
2015  Jan    1        1      3
2015  Feb    1        2      1
2016  Jan    2        2      2
2016  Feb    2        1      2

I want to multiply the annual values in df1 across the respective farms in df2 based on the year so that the output is...

df 3 

Year    Month   Farm 1      Farm 2      Farm 3
2015    Jan     1000        2000        4500
2015    Feb     1000        4000        1500
2016    Jan     1000        4000        2000
2016    Feb     1000        2000        2000

I've got the years properly formatted, but have been struggling to find the solution with group_by in dplyr. Should I be trying a different path?

Upvotes: 0

Views: 184

Answers (3)

G. Grothendieck
G. Grothendieck

Reputation: 269674

1) Base R Assuming df1 and df2 shown reproducibly in the Note at the end, merge the data frames giving data frame m. Then create a new data frame df3 by replacing all but the first two coiumns of d2 with the product of those same columns of df2 and the appropriate columns of m. No packages are used.

m <- merge(df2, df1, by = 1)
df3 <- replace(df2, -(1:2), df2[-(1:2)] * m[-(1:ncol(df2))] )

giving:

> df3
  Year Month Farm1 Farm2 Farm3
1 2015   Jan  1000  2000  4500
2 2015   Feb  1000  4000  1500
3 2016   Jan  1000  4000  2000
4 2016   Feb  1000  2000  2000

2) sqldf If you only have a few farms so that it is feasible to write them each out then:

library(sqldf)

sqldf("select 
         Year, 
         b.Month, 
         a.Farm1 * b.Farm1 Farm1,
         a.Farm2 * b.Farm2 Farm2,
         a.Farm3 * b.Farm3 Farm3
       from df2 b left join df1 a using (Year)")

giving:

  Year Month Farm1 Farm2 Farm3
1 2015   Jan  1000  2000  4500
2 2015   Feb  1000  4000  1500
3 2016   Jan  1000  4000  2000
4 2016   Feb  1000  2000  2000

Note

Lines1 <- "
Year  Farm1  Farm2  Farm3
2015    1000    2000    1500
2016    500     2000    1000"

Lines2 <- "
Year Month  Farm1 Farm2 Farm3
2015  Jan    1        1      3
2015  Feb    1        2      1
2016  Jan    2        2      2
2016  Feb    2        1      2"

df1 <- read.table(text = Lines1, header = TRUE)
df2 <- read.table(text = Lines2, header = TRUE)

Upvotes: 1

akrun
akrun

Reputation: 887213

Here is an option with join from data.table. Join the second dataset ('df2') with the first ('df1') on the 'Year' column, and multiply the .SD (Subset of data.table based on the columns specified in the .SDcols) with the corresponding columns in the first data, assign (:=) the output to update the 'Farm" columns in the second dataset

library(data.table)
nm1 <- grep("Farm", names(df1), value = TRUE)
setDT(df2)[df1, (nm1) := .SD * mget(paste0("i.", names(.SD))), 
           on = .(Year), .SDcols = nm1]
df2
#   Year Month Farm1 Farm2 Farm3
#1: 2015   Jan  1000  2000  4500
#2: 2015   Feb  1000  4000  1500
#3: 2016   Jan  1000  4000  2000
#4: 2016   Feb  1000  2000  2000

Upvotes: 1

Travis Hinkelman
Travis Hinkelman

Reputation: 164

I would approach this problem by transforming the data frames into long format, joining them, and then doing the calculation. Here is an example:

# Load packages
library(dplyr)
library(tidyr)

# Make-up data
df1 = data.frame(Year = 2008:2018,
                 Farm1 = runif(n = 11, min = 0, max = 2000),
                 Farm2 = runif(n = 11, min = 0, max = 2000),
                 Farm3 = runif(n = 11, min = 0, max = 2000))

df2 = expand.grid(Year = 2008:2018,
                  Month = month.abb[1:12]) %>% 
  mutate(Farm1 = runif(n = 132, min = 0, max = 10),
         Farm2 = runif(n = 132, min = 0, max = 10),
         Farm3 = runif(n = 132, min = 0, max = 10))

# Transform data into long format
df1.long = df1 %>%
  gather(key = Farm, value = AnnualValue, Farm1:Farm3)

df2.long = df2 %>%
  gather(key = Farm, value = Value, Farm1:Farm3)

# Now left_join on Year and multiply columns
df.comb = left_join(df1.long, df2.long) %>% 
  mutate(NewValue = Value * AnnualValue)

# Transform back to wide format (if necessary)
df.comb.wide = df.comb %>% 
  select(-AnnualValue, -Value) %>% # drop values not included in wide format
  spread(key = Farm, value = NewValue)

Upvotes: 0

Related Questions