M.Teich
M.Teich

Reputation: 607

Perform calculation for multiple groups with columns from seperate data frames

I want to perform a simple calculation for multiple groups, where columns from seperate data frames are multiplied based on a unique identifier. This is the first data frame:

year1 <- c(rep(2016,3),rep(2017,3),rep(2018,3))
group <- rep(letters[1:3],3)
prop <- c(0.3,0.6,0.1,0.4,0.3,0.3,0.2,0.5,0.3)
df1 <- as.data.frame(cbind(year1,group,prop))
df1$prop <- as.numeric(as.character(df1$prop))

This is the second:

year2 <- c(2016,2017,2018)
value <- c(325,483,742)
df2 <- as.data.frame(cbind(year2,value))

I would now like to add a column to the first data frame, where for each year1 and group in df1 prop is multiplied with value for the corresponding year2 in df2.

In theory, a solution using group_by and mutate would do the trick, but I don't know how to index year for both data frames with these commands. Any suggestions will be well appreciated. Thanks!

Upvotes: 0

Views: 65

Answers (2)

YOLO
YOLO

Reputation: 21739

Another way:

## using data.table
setDT(df1)
setDT(df2)

# set column types of key columns to be same
df1[, year1 := as.numeric()]

# merge files and get result
df1 <- merge(df1, df2, by.x = 'year1', by.y = 'year2')
df1[,result := prop*value, .(year1, group)]

   year1 group prop value result
1:  2016     a  0.3   325   97.5
2:  2016     b  0.6   325  195.0
3:  2016     c  0.1   325   32.5
4:  2017     a  0.4   483  193.2
5:  2017     b  0.3   483  144.9
6:  2017     c  0.3   483  144.9
7:  2018     a  0.2   742  148.4
8:  2018     b  0.5   742  371.0
9:  2018     c  0.3   742  222.6

Upvotes: 0

Stewart Ross
Stewart Ross

Reputation: 1044

You can achieve this with a join between the two tables. This can be done in base R using the merge function, or in dplyr using one of several join functions. I have used left_join for this example.

Your df1 data frame exemplar sets the year as a factor, so this has to be converted to numeric first. Your real data may not have that problem. The left join in the example below is used to make sure that all the rows in df1 are present in the joined results.

df1$year1 = as.numeric(as.character(df1$year1))
df3 = 
  left_join(df1, df2, by = c("year1" = "year2")) %>%
  mutate(result = prop * value)

>df3
  year1 group prop value result
1  2016     a  0.3   325   97.5
2  2016     b  0.6   325  195.0
3  2016     c  0.1   325   32.5
4  2017     a  0.4   483  193.2
5  2017     b  0.3   483  144.9
6  2017     c  0.3   483  144.9
7  2018     a  0.2   742  148.4
8  2018     b  0.5   742  371.0
9  2018     c  0.3   742  222.6

Upvotes: 3

Related Questions