Reputation: 2663
I have two dataframes: one dataframe that has dichotomous variables representing whether a string match is present, and a second dataframe that applies "weights" for that string across different dimensions.
For example, df1
might look like this:
organic gluten_free kosher sugar_free
1 0 0 0
1 1 0 1
1 1 0 1
0 0 1 0
1 0 1 0
The row values in the second dataframe (df2
) should match the values of the column names in df1
, and with the value in each row representing a weight.
attribute eco-friendly healthy
organic 2 3
gluten_free 1 4
kosher 3 3
sugar_free 2 3
I then want to calculate the product of the weight for each row value in df1
when the colname
in df1
is equal to the row value in df2
into separate indices. For clarity, I have included the specific calculations for the eco-friendly
index below:
organic gluten_free kosher sugar-free eco-friendly
1 0 0 0 (1*2 + 0*1 + 0*3 + 0*2)
1 1 0 1 (1*2 + 1*1 + 0*3 + 1*2)
1 1 0 1 (1*2 + 1*1 + 0*3 + 1*2)
0 0 1 0 (0*2 + 0*1 + 1*3 + 0*2)
1 0 1 0 (1*2 + 0*1 + 1*3 + 0*2)
I have written a very ugly and slow loop function to complete this task, but am confident a more elegant solution exists. Some additional sample data is below.
> dput(df1[1:100,])
structure(list(organic = c("0", "0", "0", "0", "0", "0", "1", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "1", "1", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0"), gluten_free = c("0", "1", "0", "0", "1", "0", "0", "0", "0", "0", "0", "0", "1", "0", "0", "1", "1", "1", "1", "1", "0", "0", "0", "0", "1", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "1", "1", "0", "1", "1", "1", "1", "1", "1", "0", "0", "0", "0", "0", "0", "0", "0", "1", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "1", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "1", "1", "0", "0", "0", "0", "0", "1", "0", "1", "0"), kosher = c("0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "1", "0", "0", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "1", "0", "1", "1", "1", "0", "1", "1", "0", "1", "1", "1", "1", "1", "1", "1", "0", "0", "1", "1", "0", "0", "1", "0", "0", "0", "1")), row.names = c("2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "15", "17", "18", "19", "22", "23", "24", "25", "26", "27", "28", "29", "30", "31", "32", "33", "34", "35", "36", "37", "38", "39", "40", "41", "42", "43", "44", "45", "46", "47", "48", "49", "50", "51", "52", "53", "54", "55", "56", "57", "58", "59", "60", "61", "62", "63", "64", "65", "66", "67", "68", "69", "70", "71", "72", "73", "74", "77", "78", "79", "80", "81", "83", "84", "85", "86", "87", "88", "91", "92", "93", "95", "97", "98", "101", "103", "105", "106", "108", "117", "124", "125", "127", "129", "131", "132", "133", "136", "137"), class = "data.frame")
> dput(df2[1:3,])
structure(list(attribute = c("organic", "gluten_free", "kosher"), eco_friendly = c(1L, 3L, 2L), healthy = c(2L, 0L, 1L)), row.names = 1:3, class = "data.frame")
Upvotes: 5
Views: 101
Reputation: 18681
We can take the dot product of the two df
s, keeping in mind that the %*%
operator only works for numeric matrices:
df1[] <- lapply(df1, as.numeric)
output <- cbind(df1, as.matrix(df1) %*% as.matrix(df2[,-1]))
The downside to this is that the columns in df1
and the rows in df2
have to be in the correct order. To make sure the column and row order matches, we can use the following in place of df2[,-1]
:
df2[match(names(df1), df2$attribute),-1]
Output:
> head(output)
organic gluten_free kosher eco_friendly healthy
2 0 0 0 0 0
3 0 1 0 3 0
4 0 0 0 0 0
5 0 0 0 0 0
6 0 1 0 3 0
7 0 0 0 0 0
Upvotes: 3
Reputation: 1820
Is this what you want?
df <- data.frame(organic = c(1, 1, 1, 0, 1), gluten_free = c(0, 1, 1, 0, 0),
kosher = c(0, 0, 0, 1, 1), sugar_free = c(0, 1, 1, 0, 0))
df %>% mutate(eco_friendly = organic * 2 + gluten_free * 1 + kosher * 3 + sugar_free * 2)
Upvotes: 0