user1443010
user1443010

Reputation: 267

Removing Survey non-response in R

So, I have a data frame with several continuous variables and several dummy variables. The survey that this data frame comes from uses 6,7,8 and 9 to denote different types of non-response. So, I would like to replace 6,7,8 and 9 with NA whenever they show up in a dummy variable column but leave them be in the continuous variable column.

Is there a concise way to go about doing this? Here's my data:

> dput(head(sfsuse[c(4:16)]))
structure(list(famsize = c(3L, 1L, 2L, 5L, 3L, 5L), famtype = c(2L, 
1L, 2L, 3L, 2L, 3L), cc = c(1L, 1L, 1L, 1L, 1L, 1L), nocc = c(1L, 
1L, 1L, 3L, 1L, 1L), pdloan = c(2L, 2L, 2L, 2L, 2L, 2L), help = c(2L, 
2L, 2L, 2L, 2L, 2L), budget = c(1L, 1L, 1L, 1L, 2L, 2L), income = c(340000L, 
20500L, 0L, 165000L, 95000L, -320000L), govtrans = c(7500L, 15500L, 
22000L, 350L, 0L, 9250L), childexp = c(0L, 0L, 0L, 0L, 0L, 0L
), homeown = c(1L, 1L, 1L, 1L, 1L, 2L), bank = c(2000L, 80000L, 
25000L, 20000L, 57500L, 120000L), vehval = c(33000L, 7500L, 5250L, 
48000L, 8500L, 50000L)), .Names = c("famsize", "famtype", "cc", 
"nocc", "pdloan", "help", "budget", "income", "govtrans", "childexp", 
"homeown", "bank", "vehval"), row.names = c(NA, 6L), class = "data.frame")

I'm trying to subs in NA for 6,7,8 and 9 in columns 3:7 and column 11. I know how to do this one column at a time by the column names:

 df$name[df$name %in% 6:9]<-NA

but I would have to do this for each column by name, is there a concise way to do it by column index?

Thanks

Upvotes: 0

Views: 842

Answers (1)

Chris Taylor
Chris Taylor

Reputation: 47392

This function should work

f <- function(data,k) {
  data[data[,k] %in% 6:9,k] <- NA
  data
}

Now at the console:

> for (k in c(3:7,11)) { data <- f(data,k) }

Upvotes: 1

Related Questions