Reputation: 3796

calculate mean for cases that responded to a minimum number of items in R

In SPSS you can calculate means for cases that responded to a minimum number of questions. In SPSS I would type

COMPUTE compvar = MEAN.4(var1, var2, var3, var4, var5, var6, var7).

And this would generate a new variable (i.e., compvar) only for cases that had a value present for 4 or more of the array var1 - var7. That's what the .4 is doing in the command, setting a minimum number of responses before the command will run for a case.

Any tips on doing this in R so I can stop jumping to SPSS?

Upvotes: 0

Answers (4)

Daniel

Reputation: 7832

See mean_n() in the sjmisc-package.

mean_n(data, 4)

Upvotes: 0

Brandon Bertelsen

Reputation: 44638

Let's say you had a data.frame with 5 variables

df <- data.frame(
  var1 = sample(c(NA,rnorm(5)),50,replace = TRUE),
  var2 = sample(c(NA,rnorm(5)),50,replace = TRUE),
  var3 = sample(c(NA,rnorm(5)),50,replace = TRUE),
  var4 = sample(c(NA,rnorm(5)),50,replace = TRUE),
  var5 = sample(c(NA,rnorm(5)),50,replace = TRUE)
)

I would calculate the mean for every row, first. The following command calculates the mean for every row, ignoring that the row has NA (99 in SPSS or simply "missing") values.

df$compvar <- rowMeans(df, na.rm = TRUE)

Then I would set those responses where the number of NA is greater than X (in this example, 1). This converts the data.frame into a TRUE FALSE fields that you can take a simple rowSum of, and set to a condition.

df[rowSums(sapply(df, is.na)) > 1,]$compvar <- NA

You should look over each of the following to garner an understanding of what is being provided at each step:

sapply(df, is.na)
rowSums(sapply(df, is.na))
rowSums(sapply(df, is.na)) > 1
df[rowSums(sapply(df, is.na)) > 1,]

As a funciton, this could be written as:

#' Row means with minimum response
#' 
#' Emulates SPSS MEAN.X functionality
#' @param df A data.frame
#' @param x The number of responses required per row.
#' @export
meanx <- function(df,x) { 
  df$compvar <- rowMeans(df, na.rm = TRUE)
  df[rowSums(sapply(df, is.na)) > x,]$compvar <- NA
  return(df)
}

Upvotes: 1

RHertel

Reputation: 23788

This could represent a possibility:

compvar <- sapply(1:nrow(df),function(x) ifelse(sum(!is.na(df[x,])*1)>=4, mean(as.numeric(df[x,]),na.rm=TRUE),NA))

I'm assuming that your numeric data is stored in a dataframe df. The output is a vector compvar of length nrow which contains either the mean of the corresponding row in df, or NA if there are less than four non-NA entries in that row.

Upvotes: 2

jeremycg

Reputation: 24945

There isn't a built in function as far as I know - here's one you can try:

mycolmeans<-function(df,n){
  holding<-colMeans(df,na.rm=TRUE)
  holding[n > as.vector(colSums(!is.na(df)))]<-NA
  holding
}

This assumes you have a dataframe holding your values in columns, and you want an NA returned when it has too many missing values, which are denoted as NAs.

x <- structure(list(a = c(1, 2, 3, 4, 5, 6), b = c(NA, NA, 3, 4, 5, 
6)), .Names = c("a", "b"), row.names = c(NA, -6L), class = "data.frame")

mycolmeans(x,4)
mycolmeans(x,6)

Upvotes: 3

calculate mean for cases that responded to a minimum number of items in R

Answers (4)

Related Questions