user3329830
user3329830

Reputation: 1

How to exclude certain observations while generating summary statistics without creating a new data frame in R

My problem is:

I have a large number of numeric variables for which I need to generate summary statistics. Some of the observations are coded "-99", which means the participant does not know the answer to the survey question.

While calculating means for such variables, I want to exclude the "-99" observations. Since I have a lot of variables, it would be quite onerous to use "subset".

Does anyone know an easier way?

PS: I know that for factors, the >- Summarize(df, exclude ="") command in the FSA package could work. I am just not sure if there is an equivalent for numeric variables.

Upvotes: 0

Views: 1530

Answers (1)

Thomas
Thomas

Reputation: 44555

Just make yourself a simple wrapper function around summary:

set.seed(1)
x <- rnorm(100)
x[sample(seq_along(x), 10)] <- -99
summary2 <- function(x) summary(x[x!=-99])

Compare results:

> summary(x)
     Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
-99.00000  -0.70810  -0.04209  -9.79400   0.59810   2.40200

> summary2(x)
    Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
-2.21500 -0.52640  0.07445  0.11770  0.67230  2.40200 

Upvotes: 1

Related Questions