Chris
Chris

Reputation: 353

How to create a calculated column using a lookup table and a formula in R?

I have a dataframe that looks like this:

   V1  V2   V3
   3    4   3
   2    4   3
   4    4   3
   4    4   4
   1    4   2
   4    2   4
   4    4   1
   4    4   2
   3    4   1
   4    4   4
   4    4   2
   4    4   2
   2    1   2
   3    2   3
   3    4   3
   3    4   2
   4    4   2
   4    4   4
   2    3   3
   3    4   1

I also have a lookup table like this:

 V_id   coeff  weight 
  V1   0.82     4.77   
  V2   0.75     4.77   
  V3   0.67     4.77

I want to use these values in the lookup table to create a new calculated column in DF1 using

(V1*coeff+V2*coeff+V3*coeff)/weight

The final dataframe should look like this.

   V1   V2  V3  new_column
   3    4   3   1.566037736
   2    4   3   1.394129979
   4    4   3   1.737945493
   4    4   4   1.878406709
   1    4   2   1.081761006
   4    2   4   1.5639413
   4    4   1   1.457023061
   4    4   2   1.597484277
   3    4   1   1.285115304
   4    4   4   1.878406709
   4    4   2   1.597484277
   4    4   2   1.597484277
   2    1   2   0.78197065
   3    2   3   1.251572327
   3    4   3   1.566037736
   3    4   2   1.42557652
   4    4   2   1.597484277
   4    4   4   1.878406709
   2    3   3   1.236897275
   3    4   1   1.285115304

I have to do this for a data frame with 1125 columns.

Upvotes: 1

Views: 310

Answers (2)

Edward
Edward

Reputation: 19069

Edit Updated answer to updated question (data frame has 1,125 columns):

df1_V <- as.matrix(df1) # or select the "V" columns using df1[, 1:1125]
df1$new_column <- df1_V %*% df2$coef / df2$weight[1]

This is a general solution which will work for any number of columns as long as the columns of df1 are arranged in the same manner (across the columns of the data frame) as the coef values are ordered (row-wise) in df2, and the number of columns in df1 equals the number of rows in df2, that is ncol(df1_V) = nrow(df2).


Answer (to original question):

library(dplyr)

df %>%
  mutate(new_column = (V1*0.82 + V2*0.75 + V3*0.67) / 4.77)

   V1 V2 V3 new_column
1   3  4  3  1.5660377
2   2  4  3  1.3941300
3   4  4  3  1.7379455
4   4  4  4  1.8784067
5   1  4  2  1.0817610
6   4  2  4  1.5639413
...

Alternative:

df1$new_column <- as.matrix(df1) %*% c(0.82, 0.75, 0.67) / 4.77

Upvotes: 1

akrun
akrun

Reputation: 887431

We can convert the first data to long format and then do a group by row number to get the calculated column

library(dplyr)
library(tidyr)
df1 %>% 
   mutate(rn = row_number()) %>%
   pivot_longer(cols = -rn, names_to = "V_id") %>% 
   left_join(df2)  %>% 
   group_by(rn) %>% 
   summarise(new_column = sum(coeff *value)/weight[1]) %>% 
   select(new_column) %>% 
   bind_cols(df1, .)
# A tibble: 20 x 4
#      V1    V2    V3 new_column
#   <int> <int> <int>      <dbl>
# 1     3     4     3      1.57 
# 2     2     4     3      1.39 
# 3     4     4     3      1.74 
# 4     4     4     4      1.88 
# 5     1     4     2      1.08 
# 6     4     2     4      1.56 
# 7     4     4     1      1.46 
# 8     4     4     2      1.60 
# 9     3     4     1      1.29 
#10     4     4     4      1.88 
#11     4     4     2      1.60 
#12     4     4     2      1.60 
#13     2     1     2      0.782
#14     3     2     3      1.25 
#15     3     4     3      1.57 
#16     3     4     2      1.43 
#17     4     4     2      1.60 
#18     4     4     4      1.88 
#19     2     3     3      1.24 
#20     3     4     1      1.29 

In base R, we can also do

df1$new_column <- c(tcrossprod(df2$coeff, as.matrix(df1)))/df2$weight[1]

data

df1 <- structure(list(V1 = c(3L, 2L, 4L, 4L, 1L, 4L, 4L, 4L, 3L, 4L, 
4L, 4L, 2L, 3L, 3L, 3L, 4L, 4L, 2L, 3L), V2 = c(4L, 4L, 4L, 4L, 
4L, 2L, 4L, 4L, 4L, 4L, 4L, 4L, 1L, 2L, 4L, 4L, 4L, 4L, 3L, 4L
), V3 = c(3L, 3L, 3L, 4L, 2L, 4L, 1L, 2L, 1L, 4L, 2L, 2L, 2L, 
3L, 3L, 2L, 2L, 4L, 3L, 1L)), class = "data.frame", row.names = c(NA, 
-20L))

df2 <- structure(list(V_id = c("V1", "V2", "V3"), coeff = c(0.82, 0.75, 
0.67), weight = c(4.77, 4.77, 4.77)), class = "data.frame", row.names = c(NA, 
-3L))

Upvotes: 1

Related Questions