Analytical Melody
Analytical Melody

Reputation: 13

I need to remove all rows that are missing values in either of two columns in a data frame

I have a very very large data frame. It has 10 columns and millions of rows.

I have two columns that have a lot of empty fields. both columns are in chr format. I need to remove the rows with empty fields in either column.

I will try to provide an example:

library(tidyverse)

tibble(
  start_loc = c("A St", "B St", ""),
  end_loc = c("D St", "", "F St"),
  id = c("m", "c", "m")
) %>% 
  {. ->> my_data}

my_data

# A tibble: 3 x 3
  start_loc end_loc id   
  <chr>     <chr>   <chr>
1 "A St"    "D St"  m    
2 "B St"    ""      c    
3 ""        "F St"  m 

So, I need to remove all rows that have blank fields in start_loc and end_loc.

Upvotes: 1

Views: 769

Answers (2)

Ronak Shah
Ronak Shah

Reputation: 389235

Using rowSums in base R -

my_data[rowSums(my_data[1:2] == '') == 0, ]

#  start_loc end_loc id   
#  <chr>     <chr>   <chr>
#1 A St      D St    m    

Upvotes: 0

akrun
akrun

Reputation: 887821

We may use nzchar with if_all (assuming empty is "" and not NA)

library(dplyr)
my_data %>% 
   filter(if_all(c(start_loc, end_loc), nzchar))

-output

# A tibble: 1 x 3
  start_loc end_loc id   
  <chr>     <chr>   <chr>
1 A St      D St    m    

Or use if_any and negate (!)

my_data %>%
    filter(!if_any(c(start_loc, end_loc), `==`, ""))
# A tibble: 1 x 3
  start_loc end_loc id   
  <chr>     <chr>   <chr>
1 A St      D St    m    

If there are both "" and NA, we can use

my_data %>%
    filter(!if_any(c(start_loc, end_loc), ~ is.na(.)|. == ""))
# A tibble: 1 x 3
  start_loc end_loc id   
  <chr>     <chr>   <chr>
1 A St      D St    m    

Upvotes: 3

Related Questions