Reputation: 3796
In SPSS you can calculate means for cases that responded to a minimum number of questions. In SPSS I would type
COMPUTE compvar = MEAN.4(var1, var2, var3, var4, var5, var6, var7).
And this would generate a new variable (i.e., compvar) only for cases that had a value present for 4 or more of the array var1 - var7. That's what the .4 is doing in the command, setting a minimum number of responses before the command will run for a case.
Any tips on doing this in R so I can stop jumping to SPSS?
Upvotes: 0
Views: 944
Reputation: 44638
Let's say you had a data.frame with 5 variables
df <- data.frame(
var1 = sample(c(NA,rnorm(5)),50,replace = TRUE),
var2 = sample(c(NA,rnorm(5)),50,replace = TRUE),
var3 = sample(c(NA,rnorm(5)),50,replace = TRUE),
var4 = sample(c(NA,rnorm(5)),50,replace = TRUE),
var5 = sample(c(NA,rnorm(5)),50,replace = TRUE)
)
I would calculate the mean for every row, first. The following command calculates the mean for every row, ignoring that the row has NA (99 in SPSS or simply "missing") values.
df$compvar <- rowMeans(df, na.rm = TRUE)
Then I would set those responses where the number of NA is greater than X (in this example, 1). This converts the data.frame into a TRUE FALSE fields that you can take a simple rowSum of, and set to a condition.
df[rowSums(sapply(df, is.na)) > 1,]$compvar <- NA
You should look over each of the following to garner an understanding of what is being provided at each step:
sapply(df, is.na)
rowSums(sapply(df, is.na))
rowSums(sapply(df, is.na)) > 1
df[rowSums(sapply(df, is.na)) > 1,]
As a funciton, this could be written as:
#' Row means with minimum response
#'
#' Emulates SPSS MEAN.X functionality
#' @param df A data.frame
#' @param x The number of responses required per row.
#' @export
meanx <- function(df,x) {
df$compvar <- rowMeans(df, na.rm = TRUE)
df[rowSums(sapply(df, is.na)) > x,]$compvar <- NA
return(df)
}
Upvotes: 1
Reputation: 23788
This could represent a possibility:
compvar <- sapply(1:nrow(df),function(x) ifelse(sum(!is.na(df[x,])*1)>=4, mean(as.numeric(df[x,]),na.rm=TRUE),NA))
I'm assuming that your numeric data is stored in a dataframe df
. The output is a vector compvar
of length nrow which contains either the mean of the corresponding row in df
, or NA
if there are less than four non-NA entries in that row.
Upvotes: 2
Reputation: 24945
There isn't a built in function as far as I know - here's one you can try:
mycolmeans<-function(df,n){
holding<-colMeans(df,na.rm=TRUE)
holding[n > as.vector(colSums(!is.na(df)))]<-NA
holding
}
This assumes you have a dataframe holding your values in columns, and you want an NA returned when it has too many missing values, which are denoted as NAs.
x <- structure(list(a = c(1, 2, 3, 4, 5, 6), b = c(NA, NA, 3, 4, 5,
6)), .Names = c("a", "b"), row.names = c(NA, -6L), class = "data.frame")
mycolmeans(x,4)
mycolmeans(x,6)
Upvotes: 3