H.Stevens
H.Stevens

Reputation: 399

Removing duplicates if there is NA in one of the duplicates in R

I am trying to remove duplicates from a dataset (caused by merging). However, one row contains a value and one does not, in some cases both rows are NA. I want to keep the ones with data, and if there are on NAs, then it does not matter which I keep. How do I do that? I am stuck.

I unsuccessfully tried the solutions from here (also not usually working with data.table, so I dont understand whats what)

R data.table remove rows where one column is duplicated if another column is NA

Some minimum example data:

df <- data.frame(ID = c("A", "A", "B", "B", "C", "D", "E", "G", "H", "J", "J"),
                 value = c(NA, 1L, NA, NA, 1L, 1L, 1L, 1L, 1L, NA, 1L))

ID value
A    NA
A     1
B    NA
B    NA
C     1
D     1
E     1
G     1
H     1
J    NA
J     1

and I want this:

ID value
A     1
B    NA
C     1
D     1
E     1
G     1
H     1
J     1

Upvotes: 4

Views: 2102

Answers (3)

s_baldur
s_baldur

Reputation: 33488

Here is a relatively simple data.table solution.

Grouping by ID if all the values are NA just take the first value, if not take all values that are not NA.

library(data.table)
setDT(df)
df[, if (all(is.na(value))) value[1] else value[!is.na(value)], by = ID]

Upvotes: 4

Darren Tsai
Darren Tsai

Reputation: 35554

An alternative of @tmfmnk's answer with slice_max() in dplyr.

library(dplyr)

df %>%
  group_by(ID) %>%
  slice_max(!is.na(value), with_ties = F)

# # A tibble: 8 x 2
# # Groups:   ID [8]
#   ID    value
#   <chr> <int>
# 1 A         1
# 2 B        NA
# 3 C         1
# 4 D         1
# 5 E         1
# 6 G         1
# 7 H         1
# 8 J         1

Upvotes: 3

tmfmnk
tmfmnk

Reputation: 39858

One possibility using dplyr could be:

df %>%
 group_by(ID) %>%
 slice(which.max(!is.na(value)))

  ID    value
  <chr> <int>
1 A         1
2 B        NA
3 C         1
4 D         1
5 E         1
6 G         1
7 H         1
8 J         1

Upvotes: 8

Related Questions