Reputation: 13
I have a very very large data frame. It has 10 columns and millions of rows.
I have two columns that have a lot of empty fields. both columns are in chr format. I need to remove the rows with empty fields in either column.
I will try to provide an example:
library(tidyverse)
tibble(
start_loc = c("A St", "B St", ""),
end_loc = c("D St", "", "F St"),
id = c("m", "c", "m")
) %>%
{. ->> my_data}
my_data
# A tibble: 3 x 3
start_loc end_loc id
<chr> <chr> <chr>
1 "A St" "D St" m
2 "B St" "" c
3 "" "F St" m
So, I need to remove all rows that have blank fields in start_loc
and end_loc
.
Upvotes: 1
Views: 769
Reputation: 389235
Using rowSums
in base R -
my_data[rowSums(my_data[1:2] == '') == 0, ]
# start_loc end_loc id
# <chr> <chr> <chr>
#1 A St D St m
Upvotes: 0
Reputation: 887821
We may use nzchar
with if_all
(assuming empty is ""
and not NA
)
library(dplyr)
my_data %>%
filter(if_all(c(start_loc, end_loc), nzchar))
-output
# A tibble: 1 x 3
start_loc end_loc id
<chr> <chr> <chr>
1 A St D St m
Or use if_any
and negate (!
)
my_data %>%
filter(!if_any(c(start_loc, end_loc), `==`, ""))
# A tibble: 1 x 3
start_loc end_loc id
<chr> <chr> <chr>
1 A St D St m
If there are both ""
and NA
, we can use
my_data %>%
filter(!if_any(c(start_loc, end_loc), ~ is.na(.)|. == ""))
# A tibble: 1 x 3
start_loc end_loc id
<chr> <chr> <chr>
1 A St D St m
Upvotes: 3