R dropping NA's in logical column levels

Question

I have a dataframe, which includes a corrupt row with NAs and "". I cannot remove this from the .csv file I am importing into R since Excel cannot deal with (opening) the size of the .csv document.

I do a check when I first read.csv() like below to remove the row with NA:

  if ( any( is.na(unique(data$A)) )   ){
  print("WARNING: data has a corrupt row in it!")  
  data <- data[ !is.na(data$A) , ]  
  }

However, as if it is a factor, the Acolumn remembers NA as a level:

> summary(data$A)
   Mode   FALSE    TRUE    NA's 
logical  185692   36978       0

This obviously causes issues when I am trying to fit a linear model. How can I get rid of the NA as a logical level here?

I tried this but doesn't seem to work:

A <- as.logical(droplevels(factor(data_combine$A)))
summary(A)
   Mode   FALSE    TRUE    NA's 
logical  185692   36978       0 
unique(A)
[1] FALSE  TRUE

Rich Scriven · Accepted Answer

First, your data$A is not a factor, it's a logical. The summary print methods are not the same for factors and logicals. Logicals use summary.default while factors dispatch to summary.factor. Plus it tells you in the result that the variable is a logical.

fac <- factor(c(NA, letters[1:4]))
log <- c(NA, logical(4), !logical(2))
summary(fac)
#   a    b    c    d NA's 
#   1    1    1    1    1 
summary(log)
#    Mode   FALSE    TRUE    NA's 
# logical       4       2       1

See ?summary for the differences.

Second, your call

A <- as.logical(droplevels(factor(data_combine$A)))
summary(A)

is also calling summary.default because you wrapped droplevels with as.logical (why?). So don't change data_combine$A at all, and just try

summary(data_combine$A)

and see how that goes. For more information, please provide a sample of your data.

R dropping NA's in logical column levels

Answers (2)

Related Questions

R dropping NA&#39;s in logical column levels

Answers (2)

Related Questions

R dropping NA's in logical column levels