Reputation: 8474
I am working on a large dataset, with some rows with NAs and others with blanks:
df <- data.frame(ID = c(1:7),
home_pc = c("","CB4 2DT", "NE5 7TH", "BY5 8IB", "DH4 6PB","MP9 7GH","KN4 5GH"),
start_pc = c(NA,"Home", "FC5 7YH","Home", "CB3 5TH", "BV6 5PB",NA),
end_pc = c(NA,"CB5 4FG","Home","","Home","",NA))
How do I remove the NAs and blanks in one go (in the start_pc and end_pc columns)? I have in the past used:
df<- df[-which(is.na(df$start_pc)), ]
... to remove the NAs - is there a similar command to remove the blanks?
Upvotes: 79
Views: 236910
Reputation: 81
An easy approach would be making all the blank cells NA
and only keeping complete cases. You might also look for na.omit
examples. It is a widely discussed topic.
df[df==""]<-NA
df<-df[complete.cases(df),]
Upvotes: 8
Reputation: 81
Alternative solution can be to remove the rows with blanks in one variable:
df <- subset(df, VAR != "")
Upvotes: 8
Reputation: 7141
An elegant solution with dplyr would be:
df %>%
# recode empty strings "" by NAs
na_if("") %>%
# remove NAs
na.omit
Upvotes: 23
Reputation: 179398
It is the same construct - simply test for empty strings rather than NA
:
Try this:
df <- df[-which(df$start_pc == ""), ]
In fact, looking at your code, you don't need the which
, but use the negation instead, so you can simplify it to:
df <- df[!(df$start_pc == ""), ]
df <- df[!is.na(df$start_pc), ]
And, of course, you can combine these two statements as follows:
df <- df[!(df$start_pc == "" | is.na(df$start_pc)), ]
And simplify it even further with with
:
df <- with(df, df[!(start_pc == "" | is.na(start_pc)), ])
You can also test for non-zero string length using nzchar
.
df <- with(df, df[!(nzchar(start_pc) | is.na(start_pc)), ])
Disclaimer: I didn't test any of this code. Please let me know if there are syntax errors anywhere
Upvotes: 34