roody
roody

Reputation: 2663

Combining two dataframes to calculate variable when column names are equal to row values

I have two dataframes: one dataframe that has dichotomous variables representing whether a string match is present, and a second dataframe that applies "weights" for that string across different dimensions.

For example, df1 might look like this:

organic    gluten_free    kosher   sugar_free
1          0              0        0
1          1              0        1
1          1              0        1
0          0              1        0
1          0              1        0

The row values in the second dataframe (df2) should match the values of the column names in df1, and with the value in each row representing a weight.

attribute    eco-friendly     healthy 
organic      2                3         
gluten_free  1                4         
kosher       3                3         
sugar_free   2                3         

I then want to calculate the product of the weight for each row value in df1 when the colname in df1 is equal to the row value in df2 into separate indices. For clarity, I have included the specific calculations for the eco-friendly index below:

organic    gluten_free    kosher   sugar-free  eco-friendly
1          0              0        0           (1*2 + 0*1 + 0*3 + 0*2)
1          1              0        1           (1*2 + 1*1 + 0*3 + 1*2)
1          1              0        1           (1*2 + 1*1 + 0*3 + 1*2)
0          0              1        0           (0*2 + 0*1 + 1*3 + 0*2)
1          0              1        0           (1*2 + 0*1 + 1*3 + 0*2)

I have written a very ugly and slow loop function to complete this task, but am confident a more elegant solution exists. Some additional sample data is below.

> dput(df1[1:100,]) 
structure(list(organic = c("0", "0", "0", "0", "0", "0", "1",  "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0",  "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0",  "0", "0", "0", "0", "1", "1", "0", "0", "0", "0", "0", "0", "0",  "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0",  "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0",  "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0",  "0", "0"), gluten_free = c("0", "1", "0", "0", "1", "0", "0",  "0", "0", "0", "0", "0", "1", "0", "0", "1", "1", "1", "1", "1",  "0", "0", "0", "0", "1", "0", "0", "0", "0", "0", "0", "0", "0",  "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0",  "0", "0", "0", "0", "1", "1", "0", "1", "1", "1", "1", "1", "1",  "0", "0", "0", "0", "0", "0", "0", "0", "1", "0", "0", "0", "0",  "0", "0", "0", "0", "0", "0", "1", "0", "0", "0", "0", "0", "0",  "0", "0", "0", "0", "1", "1", "0", "0", "0", "0", "0", "1", "0",  "1", "0"), kosher = c("0", "0", "0", "0", "0", "0", "0", "0",  "0", "0", "0", "0", "0", "0", "0", "0", "0", "1", "0", "0", "1",  "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1",  "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1",  "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1",  "1", "1", "1", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0",  "1", "0", "1", "1", "1", "0", "1", "1", "0", "1", "1", "1", "1",  "1", "1", "1", "0", "0", "1", "1", "0", "0", "1", "0", "0", "0",  "1")), row.names = c("2", "3", "4", "5", "6", "7", "8", "9",  "10", "11", "12", "15", "17", "18", "19", "22", "23", "24", "25",  "26", "27", "28", "29", "30", "31", "32", "33", "34", "35", "36",  "37", "38", "39", "40", "41", "42", "43", "44", "45", "46", "47",  "48", "49", "50", "51", "52", "53", "54", "55", "56", "57", "58",  "59", "60", "61", "62", "63", "64", "65", "66", "67", "68", "69",  "70", "71", "72", "73", "74", "77", "78", "79", "80", "81", "83",  "84", "85", "86", "87", "88", "91", "92", "93", "95", "97", "98",  "101", "103", "105", "106", "108", "117", "124", "125", "127",  "129", "131", "132", "133", "136", "137"), class = "data.frame")

> dput(df2[1:3,]) 
structure(list(attribute = c("organic", "gluten_free", "kosher"), eco_friendly = c(1L, 3L, 2L), healthy = c(2L, 0L, 1L)), row.names = 1:3, class = "data.frame")

Upvotes: 5

Views: 101

Answers (2)

acylam
acylam

Reputation: 18681

We can take the dot product of the two dfs, keeping in mind that the %*% operator only works for numeric matrices:

df1[] <- lapply(df1, as.numeric)
output <- cbind(df1, as.matrix(df1) %*% as.matrix(df2[,-1]))

The downside to this is that the columns in df1 and the rows in df2 have to be in the correct order. To make sure the column and row order matches, we can use the following in place of df2[,-1]:

df2[match(names(df1), df2$attribute),-1]

Output:

> head(output)
  organic gluten_free kosher eco_friendly healthy
2       0           0      0            0       0
3       0           1      0            3       0
4       0           0      0            0       0
5       0           0      0            0       0
6       0           1      0            3       0
7       0           0      0            0       0

Upvotes: 3

massisenergy
massisenergy

Reputation: 1820

Is this what you want?

df <- data.frame(organic = c(1, 1, 1, 0, 1), gluten_free = c(0, 1, 1, 0, 0),           
                 kosher = c(0, 0, 0, 1, 1), sugar_free = c(0, 1, 1, 0, 0))
df %>% mutate(eco_friendly = organic * 2 + gluten_free * 1 + kosher * 3 + sugar_free * 2)

Upvotes: 0

Related Questions