Reputation: 1
My problem is:
I have a large number of numeric variables for which I need to generate summary statistics. Some of the observations are coded "-99", which means the participant does not know the answer to the survey question.
While calculating means for such variables, I want to exclude the "-99" observations. Since I have a lot of variables, it would be quite onerous to use "subset".
Does anyone know an easier way?
PS: I know that for factors, the >- Summarize(df, exclude ="") command in the FSA package could work. I am just not sure if there is an equivalent for numeric variables.
Upvotes: 0
Views: 1530
Reputation: 44555
Just make yourself a simple wrapper function around summary
:
set.seed(1)
x <- rnorm(100)
x[sample(seq_along(x), 10)] <- -99
summary2 <- function(x) summary(x[x!=-99])
Compare results:
> summary(x)
Min. 1st Qu. Median Mean 3rd Qu. Max.
-99.00000 -0.70810 -0.04209 -9.79400 0.59810 2.40200
> summary2(x)
Min. 1st Qu. Median Mean 3rd Qu. Max.
-2.21500 -0.52640 0.07445 0.11770 0.67230 2.40200
Upvotes: 1