medavis6
medavis6

Reputation: 863

Scale percentage data to mean of 50%

I have a data set that has a weighted mean of 0.4860247. I am attempting to normalize the data around .5. I am using scale(), but my issue is that I don't have all the data. Instead, I have total count in one column and percent in the other.

data <- data.frame(percent = c(0.455188841201717, 0.461817275747508, 0.464727272727273, 0.466502777777778,  0.472820895522388, 0.475576045627376, 0.489019313304721, 0.490855421686747, 0.491118959107807, 0.506631578947368, 0.526727272727273, 0.541372950819672), 
n = c(233, 301, 198, 360, 201, 1052, 466, 332, 269, 304, 374, 244)
)

How can I use weighted numbers to create a scaled distribution around 0.5? Do I need to simulate (rnorm()) the data and then run scale()?

EDIT: n will stay the same. I would like to adjust percent to be normally distributed around a mean of 0.5. Basically, my data has been skewed to not have a mean of 0.5. What I'm attempting to do is normalize the data to have a mean of 0.5 so that I can see how much better or worse a number is in comparison to that mean of 0.5.

The current weighted mean of my data is 0.4860247. My desired output is to scale all numbers greater than the weighted mean to be above .5 and all numbers less than the weighted mean to be less than .5.

Upvotes: 0

Views: 464

Answers (1)

d.b
d.b

Reputation: 32548

x = 0.5*sum(df$n) - sum(df$percent*df$n) #additional 'percent*n' required
df$pr = (df$percent*df$n)/ sum(df$percent*df$n) #proportion by which 'x' should be split
df$percent_2 = df$percent + df$pr*x/df$n #add portion of 'x' to each 'percent'
sum(df$percent_2*df$n)/sum(df$n) #New weighted mean
#[1] 0.5

DATA

df = structure(list(percent = c(0.455188841201717, 0.461817275747508, 
0.464727272727273, 0.466502777777778, 0.472820895522388, 0.475576045627376, 
0.489019313304721, 0.490855421686747, 0.491118959107807, 0.506631578947368, 
0.526727272727273, 0.541372950819672), n = c(233, 301, 198, 360, 
201, 1052, 466, 332, 269, 304, 374, 244)), .Names = c("percent", 
"n"), class = "data.frame", row.names = c(NA, -12L))

Upvotes: 1

Related Questions