Reputation: 69
I am using stby in summary tools to calculated weighted descriptive statistics by group. However, when I do this I am getting a different answer compared to when I filter by grouping variable and then apply the descr function in summary tools. See below - mydf = my unfiltered dataframe, score is a 0-10 variable that I want to get the mean of.
##when I filter first and split my df
filtered_male <- mydf$gender %>% filter(gender==1)
with(filtered_male, stby(score, gender, descr, weights = weight))
Weighted Descriptive Statistics
score by gender
Data Frame: filtered_male
Weights: weight
N: 838
1
--------------- ------------
Mean 6.86
Std.Dev 2.93
Min 0.00
Median 8.00
Max 10.00
MAD 2.97
CV 0.43
N.Valid 1509584.07
Pct.Valid 99.70
##when I don't split my df
with(mydf, stby(score, gender, descr, weights = weight, simplify = TRUE))
Weighted Descriptive Statistics
score by gender
Data Frame: mydf
Weights: weight
N: 838
1 2
--------------- ------------ ------------
Mean 7.01 6.79
Std.Dev 2.81 3.02
Min 0.00 0.00
Median 8.00 8.00
Max 10.00 10.00
MAD 2.97 2.97
CV 0.40 0.45
N.Valid 1715494.12 1379339.65
Pct.Valid 56.05 45.07
'''
Any idea's on why this is happening or how I fix it to get the correct weighted mean? (I've check the answers manually and the mean where I filter first is correct)
Upvotes: 0
Views: 74
Reputation: 5905
Meanwhile an official fix for this, you can try to produce a valid stby
object with the following.
### Packages
library(dplyr)
library(purrr)
library(summarytools)
### Data
mtcars
### Output with summarytools
st=with(mtcars, stby(qsec, cyl,descr, weights = wt,simplify = TRUE))
Initial output :
Weighted Descriptive Statistics
qsec by cyl
Data Frame: mtcars
Weights: wt
N: 11
4 6 8
--------------- ------- ------- -------
Mean 19.04 17.95 16.73
Std.Dev 1.53 1.64 1.21
Min 16.70 15.50 14.50
Median 18.87 18.29 17.15
Max 22.90 20.22 18.00
MAD 1.48 1.90 0.93
CV 0.08 0.09 0.07
N.Valid 34.72 21.50 46.30
Pct.Valid 33.72 20.88 44.97
To fix the output :
### Replace the values in the stby object with new ones
mtcars %>%
group_by(cyl) %>%
group_map(~ descr(.x$qsec,descr, weights = .x$wt)) %>%
walk2(.y = 1:length(.),function(x,y){st[[y]][,]<<-.[[y]][,]})
### Bonus, add missing N number for each group
attributes(st[[1]])$data_info$N.Obs<-paste(map_int(1:length(st),~attributes(st[[.x]])$data_info$N.Obs),collapse = ",")
Output :
Weighted Descriptive Statistics
qsec by cyl
Data Frame: mtcars
Weights: wt
N: 11,7,14
4 6 8
--------------- -------- -------- --------
Mean 19.38 18.12 16.89
Std.Dev 1.72 1.59 1.13
Min 16.70 15.50 14.50
Median 19.24 18.46 17.34
Max 22.90 20.22 18.00
MAD 1.09 2.00 0.71
CV 0.09 0.09 0.07
N.Valid 25.14 21.82 55.99
Pct.Valid 100.00 100.00 100.00
Upvotes: 0