Reputation: 877
I would like to eliminate all duplicates except NA
values.
I Have this File:
Name weight
John . 10
John . 12
NA . 12
NA . 12
NA . 13
Peter . 15
Andy . 16
Clark . 17
And I need this:
Name weight
NA . 12
NA . 12
NA . 13
Peter . 15
Andy . 16
Clark . 17
I tried this code:
New.dt=dt[!(duplicated(dt$Name) | duplicated(dt$Name, fromLast = TRUE)), ]
But I can this:
Name weight
Peter . 15
Andy . 16
Clark . 17
And I want to keep the NA
values.
Upvotes: 2
Views: 771
Reputation: 160407
The double-tap of duplicated
is faster (I thought duplicated
would be slightly less-efficient with larger data), I suggest you go with that answer.
My answer is kept for the record.
One problem with using duplicated
is that it will never remove all duplicates, since one it removes all but one of them, it is no longer duplicated.
A one-liner:
x[ !x$Name %in% names(Filter(c, table(x$Name, useNA = "no") - 1)), ]
# Name weight
# 3 <NA> 12
# 4 <NA> 12
# 5 <NA> 13
# 6 Peter 15
# 7 Andy 16
# 8 Clark 17
Explanation:
table(x$Name, ...)
will give you a named vector with the count of each element within the Name
column; table(..., useNA="no")
to be explicit, this means that NA
values are not included in the returned vector of counts (thereby meeting your "except NA values" constraint);Filter(c, ...)
filters the named vector based on a truthy-value of the contents, where "0" is considered non-truthy (and therefore removed) ... but since table
will always return 1 or more (because it has to find one to include it in the list), ...table(...) - 1
to reduce all singles (count of 1) to 0, so that the Filter(c,...)
part can work;names(...)
returns the Name
values that have an effective count of 2 or more; and!x$Name %in% ...
does the actual removal.Data
x <- read.table(header = TRUE, stringsAsFactors = FALSE, text = "
Name weight
John 10
John 12
NA 12
NA 12
NA 13
Peter 15
Andy 16
Clark 17")
Upvotes: 1
Reputation: 649
Quick and dirty
New.dt=dt[!(duplicated(dt$Name) | duplicated(dt$Name, fromLast = TRUE)), ]
dt2 = dt[dt$Name = is.na(dt)]
rbind(New.dt, dt2)
Upvotes: 4