Reputation: 159
I have a large data frame that contains both blank missing values and NA's. Performing summary(factor(df$col)) gives me something like
A
123
B
50000
90000
C
26000
NA's
12476
(Notice the blank after 50000
.)
and sum(is.na(df$col))
is 12476, the same as the number of NA
's, but I'd like it to be the sum of the blanks and the NA
s.
I tried to create a level for the blanks by doing
levels(df$col) <- c("A", "B", "Blank", "C")
And then trying df$col <- factor(df$col, exclude="Blank")
and it says that the NA
's were generated but my output is the same. Does anyone know how to create NAs based on a factor level or have a better solution for replacing the missing values? I think the issue might be that the blanks are more than one white space character, so they didn't get turned into NA
's but I don't know how to confirm that.
Upvotes: 3
Views: 3532
Reputation: 40803
Try this:
df <- data.frame(a=11:18, col=c("C", "", "A", NA, "A", "", "C", NA))
levels(df$col) # "" "A" "C"
sum(is.na(df$col)) # 2
df$col <- factor(df$col, levels=LETTERS[1:3])
levels(df$col) # "A" "B" "C"
sum(is.na(df$col)) # 4
Since the new levels do not include blank (""), all blanks will become NA.
Upvotes: 2