Rakhim Rakhimov
Rakhim Rakhimov

Reputation: 109

Get a standard deviation by group and subtract from mean column standard deviation in R

I have a toy df that is included below. For columns Grades and Age, I need to find the difference between mean standard deviation from the total standard deviation, by University. The values in the new table - which should have 4 rows and 2 columns - should therefore represent the differences in the standard deviation of Grades and Age in each University compared to the total df standard deviation of Grades and Ages (across all universities).

Data:

library(dplyr)
df <- tibble::tribble(
           ~University, ~Countries, ~Grades, ~Age,
        "University-1",      "USA",      46,    29,
        "University-1",       "UK",      84,    30,
        "University-1",   "Sweden",       5,    28,
        "University-1",    "Spain",      40,    26,
        "University-1", "Portugal",      49,    29,
        "University-1",    "Italy",      16,    24,
        "University-1",      "USA",      34,    19,
        "University-1",       "UK",      66,    28,
        "University-1",   "Sweden",       9,    25,
        "University-1",    "Spain",      80,    20,
        "University-1", "Portugal",      55,    20,
        "University-1",    "Italy",       4,    21,
        "University-1",      "USA",      93,    18,
        "University-1",       "UK",      62,    28,
        "University-1",   "Sweden",      80,    30,
        "University-2",    "Spain",       1,    22,
        "University-2", "Portugal",      56,    25,
        "University-2",    "Italy",       9,    29,
        "University-2",      "USA",      40,    21,
        "University-2",       "UK",      54,    20,
        "University-2",   "Sweden",      60,    24,
        "University-2",    "Spain",      77,    21,
        "University-2", "Portugal",      22,    18,
        "University-2",    "Italy",      53,    29,
        "University-2",      "USA",      11,    21,
        "University-2",       "UK",      65,    27,
        "University-2",   "Sweden",      24,    27,
        "University-2",    "Spain",      18,    23,
        "University-2", "Portugal",      73,    19,
        "University-2",    "Italy",      79,    22,
        "University-1",      "USA",       2,    26,
        "University-1",       "UK",      83,    23,
        "University-1",   "Sweden",       5,    19,
        "University-1",    "Spain",      75,    19,
        "University-1", "Portugal",      12,    21,
        "University-1",    "Italy",      68,    29,
        "University-1",      "USA",     100,    21,
        "University-1",       "UK",      49,    21,
        "University-1",   "Sweden",      81,    20,
        "University-1",    "Spain",      99,    23,
        "University-1", "Portugal",      82,    24,
        "University-1",    "Italy",      23,    26,
        "University-1",      "USA",      86,    30,
        "University-1",       "UK",      50,    20,
        "University-1",   "Sweden",       4,    19,
        "University-2",    "Spain",      12,    25,
        "University-2", "Portugal",      12,    21,
        "University-2",    "Italy",      45,    21,
        "University-2",      "USA",      16,    26,
        "University-2",       "UK",      56,    23,
        "University-2",   "Sweden",      63,    24,
        "University-2",    "Spain",      37,    28,
        "University-2", "Portugal",      86,    21,
        "University-2",    "Italy",      95,    18,
        "University-2",      "USA",      56,    20,
        "University-2",       "UK",      27,    20,
        "University-2",   "Sweden",       3,    27,
        "University-2",    "Spain",      18,    27,
        "University-3", "Portugal",      68,    27,
        "University-3",    "Italy",      48,    21,
        "University-3", "Portugal",      86,    21,
        "University-3",    "Italy",      95,    18,
        "University-3",      "USA",      56,    20,
        "University-3",       "UK",      27,    20,
        "University-3",   "Sweden",       3,    27,
        "University-3",    "Spain",      18,    27,
        "University-3", "Portugal",      68,    27,
        "University-3",    "Italy",      48,    21,
        "University-4", "Portugal",      86,    21,
        "University-4",    "Italy",      95,    18,
        "University-4",      "USA",      56,    20,
        "University-4",       "UK",      27,    20,
        "University-4",   "Sweden",       3,    27,
        "University-4",    "Spain",      18,    27,
        "University-4", "Portugal",      68,    27,
        "University-4",    "Italy",      48,    21
        )

My attempt at getting at this:

df <- df %>% 
    mutate(grades_sd = sd(Grades),
           age_sd = sd(Age)) %>% 
    group_by(University) %>%
    summarise(Grades_sd = sd(Grades) - grades_sd,
              Age_sd = sd(Age) - age_sd)

This code performs the correct (I think) calculations but outputs the table in the wrong format. I'd appreciate any guidance on this.

Edit: Including expected output below.

output <- tibble::tribble(
           ~University, ~Grades_sd, ~Age_sd,
        "University-1", 2.666482, 0.40233934,   
        "University-2", -2.790652, -0.34945170, 
        "University-3", 0.881169, -0.03754330,
        "University-4", 2.398070, 0.06701784)   

Upvotes: 1

Views: 490

Answers (2)

jpdugo17
jpdugo17

Reputation: 7106

We can use the ungrouped version of the df.

library(dplyr)

df %>% 
    group_by(University) %>%
    summarise(Grades_sd = sd(Grades) - sd(df$Grades),
              Age_sd    = sd(Age) - sd(df$Age))
#> # A tibble: 4 × 3
#>   University   Grades_sd  Age_sd
#>   <chr>            <dbl>   <dbl>
#> 1 University-1     2.67   0.402 
#> 2 University-2    -2.79  -0.349 
#> 3 University-3    -0.881 -0.0375
#> 4 University-4     2.40   0.0670

Created on 2022-01-06 by the reprex package (v2.0.1)

Upvotes: 1

Samet S&#246;kel
Samet S&#246;kel

Reputation: 2670

since there is the same record for each group, summarise repeats them. taking unique value will return only one and there will be one summary row for each group.

df %>% 
    mutate(grades_sd = sd(Grades),
           age_sd = sd(Age)) %>% 
    group_by(University) %>%
    summarise(Grades_sd = sd(Grades) - unique(grades_sd),
              Age_sd = sd(Age) - unique(age_sd))

output;

  University   Grades_sd  Age_sd
  <chr>            <dbl>   <dbl>
1 University-1     2.67   0.402 
2 University-2    -2.79  -0.349 
3 University-3    -0.881 -0.0375
4 University-4     2.40   0.0670

Upvotes: 3

Related Questions