Reputation: 623
I scanned stackoverflow for more than an hour to find a solution, but failed. So posting the question.
I have a dataframe from where I need to remove duplicates, but the trick is, the duplicate values can be in two different columns for separate rows. I need to remove the row where another column has an NA value.
Example data frame
Act Func Func_2
generate numbers odd
generate numbers and
generate print <NA>
generate column print
displays time <NA>
displays date time
displays print time
displays task <NA>
Since print is present in Func as well as Func_2 with same Act value in both rows, I need to remove the row where NA is present in Func_2. However, if the value in Act column would have been different, I would need to keep both rows.
Expected data frame
Act Func Func_2
generate numbers odd
generate numbers and
generate column print
displays date time
displays print time
displays task <NA>
Upvotes: 2
Views: 237
Reputation: 23747
Try this one here:
df1 %>% group_by(Act) %>% # the following test will be done by group
mutate(test = if_else(Func %in% Func_2,
if_else(is.na(Func_2), FALSE, TRUE),
TRUE)) %>%
#this will create a logical helper column.
filter(test == TRUE) #just for completeness
# A tibble: 6 x 4
# Groups: Act [2]
Act Func Func_2 test
<chr> <chr> <chr> <lgl>
1 generate numbers odd T
2 generate numbers and T
3 generate column print T
4 displays date time T
5 displays print time T
6 displays task <NA> T
Upvotes: 1
Reputation: 572
Is this what you're after?
library(dplyr)
Make dataframe:
a <- c(1, 2, 3, 4)
b <- c(5, 6, NA, 7)
df <- data.frame(a, b)
Filter rows to ensure neither column is missing:
df <- filter(df, !is.na(a) & !is.na(b))
Upvotes: 0