Reputation: 353
I have a dataframe that looks like this:
V1 V2 V3
3 4 3
2 4 3
4 4 3
4 4 4
1 4 2
4 2 4
4 4 1
4 4 2
3 4 1
4 4 4
4 4 2
4 4 2
2 1 2
3 2 3
3 4 3
3 4 2
4 4 2
4 4 4
2 3 3
3 4 1
I also have a lookup table like this:
V_id coeff weight
V1 0.82 4.77
V2 0.75 4.77
V3 0.67 4.77
I want to use these values in the lookup table to create a new calculated column in DF1 using
(V1*coeff+V2*coeff+V3*coeff)/weight
The final dataframe should look like this.
V1 V2 V3 new_column
3 4 3 1.566037736
2 4 3 1.394129979
4 4 3 1.737945493
4 4 4 1.878406709
1 4 2 1.081761006
4 2 4 1.5639413
4 4 1 1.457023061
4 4 2 1.597484277
3 4 1 1.285115304
4 4 4 1.878406709
4 4 2 1.597484277
4 4 2 1.597484277
2 1 2 0.78197065
3 2 3 1.251572327
3 4 3 1.566037736
3 4 2 1.42557652
4 4 2 1.597484277
4 4 4 1.878406709
2 3 3 1.236897275
3 4 1 1.285115304
I have to do this for a data frame with 1125 columns.
Upvotes: 1
Views: 310
Reputation: 19069
Edit Updated answer to updated question (data frame has 1,125 columns):
df1_V <- as.matrix(df1) # or select the "V" columns using df1[, 1:1125]
df1$new_column <- df1_V %*% df2$coef / df2$weight[1]
This is a general solution which will work for any number of columns as long as the columns of df1 are arranged in the same manner (across the columns of the data frame) as the coef values are ordered (row-wise) in df2, and the number of columns in df1 equals the number of rows in df2, that is ncol(df1_V) = nrow(df2)
.
Answer (to original question):
library(dplyr)
df %>%
mutate(new_column = (V1*0.82 + V2*0.75 + V3*0.67) / 4.77)
V1 V2 V3 new_column
1 3 4 3 1.5660377
2 2 4 3 1.3941300
3 4 4 3 1.7379455
4 4 4 4 1.8784067
5 1 4 2 1.0817610
6 4 2 4 1.5639413
...
Alternative:
df1$new_column <- as.matrix(df1) %*% c(0.82, 0.75, 0.67) / 4.77
Upvotes: 1
Reputation: 887431
We can convert the first data to long format and then do a group by row number to get the calculated column
library(dplyr)
library(tidyr)
df1 %>%
mutate(rn = row_number()) %>%
pivot_longer(cols = -rn, names_to = "V_id") %>%
left_join(df2) %>%
group_by(rn) %>%
summarise(new_column = sum(coeff *value)/weight[1]) %>%
select(new_column) %>%
bind_cols(df1, .)
# A tibble: 20 x 4
# V1 V2 V3 new_column
# <int> <int> <int> <dbl>
# 1 3 4 3 1.57
# 2 2 4 3 1.39
# 3 4 4 3 1.74
# 4 4 4 4 1.88
# 5 1 4 2 1.08
# 6 4 2 4 1.56
# 7 4 4 1 1.46
# 8 4 4 2 1.60
# 9 3 4 1 1.29
#10 4 4 4 1.88
#11 4 4 2 1.60
#12 4 4 2 1.60
#13 2 1 2 0.782
#14 3 2 3 1.25
#15 3 4 3 1.57
#16 3 4 2 1.43
#17 4 4 2 1.60
#18 4 4 4 1.88
#19 2 3 3 1.24
#20 3 4 1 1.29
In base R
, we can also do
df1$new_column <- c(tcrossprod(df2$coeff, as.matrix(df1)))/df2$weight[1]
df1 <- structure(list(V1 = c(3L, 2L, 4L, 4L, 1L, 4L, 4L, 4L, 3L, 4L,
4L, 4L, 2L, 3L, 3L, 3L, 4L, 4L, 2L, 3L), V2 = c(4L, 4L, 4L, 4L,
4L, 2L, 4L, 4L, 4L, 4L, 4L, 4L, 1L, 2L, 4L, 4L, 4L, 4L, 3L, 4L
), V3 = c(3L, 3L, 3L, 4L, 2L, 4L, 1L, 2L, 1L, 4L, 2L, 2L, 2L,
3L, 3L, 2L, 2L, 4L, 3L, 1L)), class = "data.frame", row.names = c(NA,
-20L))
df2 <- structure(list(V_id = c("V1", "V2", "V3"), coeff = c(0.82, 0.75,
0.67), weight = c(4.77, 4.77, 4.77)), class = "data.frame", row.names = c(NA,
-3L))
Upvotes: 1