Using tidyverse to dynamically mutate one variable from one grouped dataset from another dataset

Question

Let's say I work with different classes (nodes, in my dataset) and I have thousands of students. Each student has their own math score, and I need to compare all individual scores with the group mean/sd. To deal with that, I have two different datasets. The first one is "a table".

This data frame is formed of several classes (nodes), their means, and their sd.

I also have another dataset composed of students' results, like this one:

I want to have another dataset in which I get all individual results (i.e., 11, 6, 10, etc) and subtract this result from all means in the first dataset. In the future, it will be needed to check all results and all nodes together.

In other words, from the first nome (number 12 in the image), I will subtract 11 (student's result) from 68 (mean result), 6 (student's result) from 68 (mean), 10 (student's result) from 68 (mean), etc. Then I'll move for the second node (number 7 in the image), and I will do the same (subtract 11 from 74 (mean result), 6 from 74 (mean result), 10 from 74),

The final output I would like to have is below. Please, take this image as illustrative-only:

Due to questions, to make my question clear, the following command also brings an interesting solution, but it's not functional since I'll have to manually add all columns to the new dataset:

test_result %>% 
  mutate(total_header_node_12 = x - default_table$t_mean[1]) %>% 
  mutate(total_header_node_74 = x - default_table$t_mean[2])

I searched for other questions but I did not find any solution. Any help is valuable. I use tidyverse, and I would like to remain within tidyverse environment. Thank you To reproduce:

> dput(default_table)
structure(list(node = structure(c(6L, 3L, 5L, 1L, 2L, 4L, 7L), .Label = c("4", 
"5", "7", "8", "10", "12", "13"), class = "factor"), t_mean = c(68.8219178082192, 
74.3260869565217, 83.0178571428571, 92.2108108108108, 98.3304347826087, 
88.6111111111111, 48.4), t_sd = c(14.4351088961341, 16.9448394654941, 
13.0272663858681, 12.2011483603603, 12.1775472144027, 14.5621088567959, 
10.4876948807826), vars = c(1, 1, 1, 1, 1, 1, 1), n = c(121, 
74, 92, 616, 191, 58, 7), mean = c(68, 74.6891891891892, 82.8369565217391, 
91.3944805194805, 97.738219895288, 88.0172413793103, 48.7142857142857
), sd = c(14.0226008048911, 16.1151045250761, 11.0426517498479, 
12.6758935948866, 12.0212336250146, 15.9169901273025, 8.63547500554709
), min = c(32, 32, 58, 36, 56, 44, 39), max = c(97, 113, 104, 
123, 128, 124, 60), range = c(65, 81, 46, 87, 72, 80, 21), se = c(1.27478189135374, 
1.87334284914993, 1.15127602962793, 0.510726307415094, 0.869825937534791, 
2.09000319547951, 3.26390275965596), q0_25 = c(59, 64, 74.75, 
84, 90, 80, 41.5), q0_5 = c(68, 73.5, 81.5, 92, 98, 87, 47), 
    q0_75 = c(80, 87.75, 92.25, 100, 106, 98.75, 56)), class = "data.frame", row.names = c(NA, 
-7L))


test_result <- data.frame(x = rnorm(100,10,2))

Ronak Shah · Accepted Answer

Probably, you can try :

library(dplyr)

default_table %>%
  tidyr::crossing(test_result) %>%
  mutate(comparative_mean = x - t_mean)

This creates all combinations of default_table with test_result and then we subtract the two values from each other.

Using tidyverse to dynamically mutate one variable from one grouped dataset from another dataset

Answers (2)

Related Questions