Picataro
Picataro

Reputation: 151

how to calculate weighted median by groups

I need to calculate a weighted median age by 3 variables (gender, sector and occupation). I know how to use "spatstat" weighted.median. But it gives the weighted age of the entire DF. I don't know hot to make it calculate it by groups/variables. I found another similar post from 2 years ago but i did not understand the solution offered in it. I did understand that you would need to use dplyr to group it. I have very limited understanding of the library. I would need someone to show me the code for the following DF so i can use it and do small changes to it if needed in the future.

My DF is as follows: The original df is much larger and has many more sectors and occupations. But the structure is the same.

 
Gender    Sector     occupation     Age     Weight
  M         A            1          45      13.12
  F         C            1          23       7.05
  M         A            3          67      34.12
  M         A            3          23      11.00
  F         C            1          33      17.45
  F         A            1          35      45.01
  F         C            3          57      21.45
  M         C            1          32      23.33
  F         A            3          48      44.54
  M         A            1          25      13.12
  F         C            1          27       7.05
  M         A            3          37      34.12
  M         A            3          26      11.00
  F         C            1          73      17.45
  F         A            1          49      45.01
  F         C            3          24      21.45
  M         C            1          25      23.33
  F         A            3          35      44.54
  M         A            1          48      13.12
  F         C            1          83       7.05
  M         A            3          66      34.12
  M         A            3          43      11.00
  F         C            1          36      17.45
  F         A            1          35      45.01
  F         C            3          57      21.45
  M         C            1          22      23.33
  F         A            3          47      44.54
  M         A            1          55      13.12
  F         C            1          28       7.05
  M         A            3          27      34.12
  M         A            3          25      11.00
  F         C            1          23      17.45
  F         A            1          46      45.01
  F         C            3          34      21.45
  M         C            1          27      23.33
  F         A            3          65      44.54

This is the result I need. NB! The median age in here is fake. It just represents the structure of the end result.

Gender    Sector     occupation   Median age
  M         A            1            23
  M         A            3            34
  M         C            1            45
  M         C            3            23
  F         A            1            34
  F         A            3            45
  F         C            1            43
  F         C            3            34

The question I found (mentioned above) I did not understand it because i do not understand mutate and the data in the example is presented in a way that i can't read. I'm new to R

weighted median by_group()

Upvotes: 1

Views: 1559

Answers (1)

akrun
akrun

Reputation: 887711

There is a weighted.median from spatstat package. We could use that along with tidyverse approach i.e. grouped by 'Gender', 'Sector' and 'occupation', apply the weighted.median with x as 'Age' and w as 'Weight' in summarise.

In the linked post, it is using mutate as it is creating a new column while summarise returns the summarised output i.e. one row for each group.

In addition, we should be cautious when using summarise/mutate as those functions are found in plyr as well and it could mask the dplyr functions if both packages are loaded i.e. if the masked function from plyr was used, then instead of grouping, it may return the weighted.median of the whole data. So, either load only dplyr or use explicitly dplyr::summarise

library(dplyr)
library(spatstat)
df1 %>% 
     group_by(Gender, Sector, occupation) %>%
     dplyr::summarise(MedianAge = weighted.median(Age, Weight),
           .groups = 'drop')

-output

# A tibble: 7 x 4
  Gender Sector occupation MedianAge
  <chr>  <chr>       <int>     <dbl>
1 F      A               1      35  
2 F      A               3      47  
3 F      C               1      31.0
4 F      C               3      34  
5 M      A               1      45  
6 M      A               3      33.8
7 M      C               1      25  

data

df1 <- structure(list(Gender = c("M", "F", "M", "M", "F", "F", "F", 
"M", "F", "M", "F", "M", "M", "F", "F", "F", "M", "F", "M", "F", 
"M", "M", "F", "F", "F", "M", "F", "M", "F", "M", "M", "F", "F", 
"F", "M", "F"), Sector = c("A", "C", "A", "A", "C", "A", "C", 
"C", "A", "A", "C", "A", "A", "C", "A", "C", "C", "A", "A", "C", 
"A", "A", "C", "A", "C", "C", "A", "A", "C", "A", "A", "C", "A", 
"C", "C", "A"), occupation = c(1L, 1L, 3L, 3L, 1L, 1L, 3L, 1L, 
3L, 1L, 1L, 3L, 3L, 1L, 1L, 3L, 1L, 3L, 1L, 1L, 3L, 3L, 1L, 1L, 
3L, 1L, 3L, 1L, 1L, 3L, 3L, 1L, 1L, 3L, 1L, 3L), Age = c(45L, 
23L, 67L, 23L, 33L, 35L, 57L, 32L, 48L, 25L, 27L, 37L, 26L, 73L, 
49L, 24L, 25L, 35L, 48L, 83L, 66L, 43L, 36L, 35L, 57L, 22L, 47L, 
55L, 28L, 27L, 25L, 23L, 46L, 34L, 27L, 65L), Weight = c(13.12, 
7.05, 34.12, 11, 17.45, 45.01, 21.45, 23.33, 44.54, 13.12, 7.05, 
34.12, 11, 17.45, 45.01, 21.45, 23.33, 44.54, 13.12, 7.05, 34.12, 
11, 17.45, 45.01, 21.45, 23.33, 44.54, 13.12, 7.05, 34.12, 11, 
17.45, 45.01, 21.45, 23.33, 44.54)), class = "data.frame", row.names = c(NA, 
-36L))

Upvotes: 4

Related Questions