Reputation: 500
Code for example data:
mydf<-data.frame(Group_ID=c("337", "337", "201", "201", "470", "470", "999", "999"),
Timestamp=c("A", "A", "B", "B", "C", "D", "E", "F"),
MU=c("1", "1", "2", "3", "4", "4", "5", "6"))
Example dataframe:
Group_ID Timestamp MU
1 337 A 1
2 337 A 1
3 201 B 2
4 201 B 3
5 470 C 4
6 470 D 4
7 999 E 5
8 999 F 6
Within "Group_ID", I want to retain only entries where neither "Timestamp" nor "MU" is duplicated. Thus, in this example, only rows 7 & 8 would be retained ("Group_ID" 999 has unique entries for both "Timestamp" and "MU").
Some of my attempts:
mydf<-mydf %>%
group_by(Group_ID) %>%
filter(unique(Timestamp))
Returns error:
"Argument 2 filter condition does not evaluate to a logical vector"
If this did work, I'd then run it again using unique(MU)
mydf<-mydf %>%
group_by(Group_ID) %>%
mydf[!duplicated(mydf$Timestamp, fromLast = TRUE),]
Returns error:
"Length of logical index vector must be 1 or 3 (the number of columns), not 8"
(Again, I'd run the code again using MU)
I've looked through similar questions but haven't found one with the same scenario. Many thanks in advance.
Upvotes: 3
Views: 340
Reputation: 47300
Here's a base solution :
is.unique <- function(x) !(duplicated(x) | duplicated(x, fromLast = TRUE))
mydf[is.unique(mydf[1:2]) & is.unique(mydf[c(1,3)]),]
#> Group_ID Timestamp MU
#> 7 999 E 5
#> 8 999 F 6
Upvotes: 0
Reputation: 886938
If we are using filter
, it expects a logical vector. The output of unique
is just unique elemnents of that column (character
class). So it is not going to work. We can use duplicated
to get a logical vector of duplicate elements, negate (!
). so that the TRUE -> FALSE and viceversa to get only the first non-duplicate element
library(dplyr)
mydf %>%
group_by(Group_ID) %>%
filter(!(duplicated(Timestamp, fromLast = TRUE)| duplicated(Timestamp)))
Or group by both 'Group_ID', 'Timestamp' and filter
based on the number of rows
mydf %>%
group_by(Group_ID, Timestamp) %>%
filter(n() == 1)
If we need only the '999' 'Group_ID'
mydf %>%
group_by(Group_ID) %>%
filter_at(vars(Timestamp, MU), all_vars(n_distinct(.) == n()))
# A tibble: 2 x 3
# Groups: Group_ID [1]
# Group_ID Timestamp MU
# <fct> <fct> <fct>
#1 999 E 5
#2 999 F 6
Or use distinct
directly
distinct(mydf, Group_ID, Timestamp, .keep_all = TRUE)
Upvotes: 5
Reputation: 32538
foo = function(x, f){
ave(as.numeric(as.factor(x)),
f,
FUN = function(y) length(unique(y)) == length(y))
}
inds = Reduce("&", lapply(mydf[c("Timestamp", "MU")],
function(x) foo(x, mydf$Group_ID) == 1))
mydf[inds,]
# Group_ID Timestamp MU
#7 999 E 5
#8 999 F 6
Upvotes: 2