Reputation: 555
I'm trying to remove rows from a dataframe.
I need to remove rows that have only "NONE" or white space across the entire range of columns I provide. The rows that need to be removed must satisfy these conditions 1. a combination of only "NONE" and white space or all "NONE" or all white space.
Because there are cases where having "NONE" or white space in some of the columns is okay I can't just filter out rows when reading in the csv with something like
dataframe$col1 =="NONE" | str_length(dataframe$col1)==0
I know this would normally be a simple problem where I could run a for loop that turns all "NONE" values and whitespace in a dataframe to NA and use complete.cases
across whichever columns I need (doc). However, I'm specifically being asked to use a method that does not change the values. Any suggestions?
Edit: I don't have the data, but here is a made up example of a dataframe that would be similar to what I have to work with
In this dataframe the only row that must be removed is row 3 (or row 4 if you include the header).
The final dataset will have many more columns than this made up example
Upvotes: 1
Views: 6375
Reputation: 10128
is.none <- function(x) tolower(x) == "none"
is.whitespace <- function(x) grepl("^\\s+$", x)
is.empty <- function(x) length(x) == 0 || x == "" || is.na(x) || is.nan(x)
is.none.whitespace.empty <- function(x) is.none(x) || is.whitespace(x) || is.empty(x)
is.none.whitespace.empty <- Vectorize(is.none.whitespace.empty)
remove.empty.rows <- function(df, cols) {
df[!sapply(1:nrow(df),
function(i) all(is.none.whitespace.empty(df[i, cols]))), ]
}
Now you can test it:
# in your case:
remove.empty.rows(df, 1) # remove if first column content is "empty"
# but you can determine which columns should be examined for being all
# "empty".
# let's say, you want to evaluate only first, third and fifth column:
remove.empty.rows(df, c(1, 3, 5))
Upvotes: 1
Reputation: 695
You can use dplyr::filter_all()
to accomplish this:
library(dplyr)
df <- data.frame(column.1 = c('a', 'b', 'NONE', 'b', 'b'),
column.2 = c('a', 'b', '', 'b', 'b'),
column.3 = rep('', 5),
column.4 = rep('', 5),
column.5 = rep('', 5))
df %>%
filter_all(any_vars(. != 'NONE' & . != ''))
Upvotes: 3
Reputation: 144
Since the rows with NONE and whitespace need to be removed, it would be:
dataframe <- filter(dataframe, col1 != "NONE" & str_length(col1) != 0)
Upvotes: 2
Reputation: 19
I would recommend using the filter()
command from the dplyr package (part of the tidyverse library). It would look something like this:
dataframe_new <- filter(dataframe, col1 == "" | str_length(col1) == 0)
Upvotes: 1