R: Subset between group values and several columns

I have a df like that:

> head(dt)

     Ref.     Date1               Date2     Value
 1: 2018146   2018-01-25      2018-02-25    9281.85
 2: 2018146   2018-01-25      2018-04-25    9281.85
 3: 2018150   2018-01-25      2018-02-25    5865.48
 4: 2018150   2018-01-25      2018-04-25    5865.48
 5: 2018306   2018-01-31      2018-03-25    9596.81
 6: 2010306   2018-01-31      2018-05-01    9596.81

I have Ref. duplicates because in the value Date, Date2 and Value there are some differences.

I want to keep only the Ref. that have differences between all the other columns:

for example:

I want to keep that rows:

Ref.     Date1               Date2     Value
2016487   2017-05-01      2017-06-30   -5343.68
2016487   2016-04-29      2016-05-31   -7451.18

I want to skip that rows (just differing in one column not in all 3):

  Ref.     Date1       Date2     Value
 2015778 2016-10-27 2016-12-26 -108.01
 2015778 2016-10-27 2016-12-26 -108.00

I have tried to do that:

x %>% group_by(Invoice) %>% 
filter(!duplicated(x$Date1)&!duplicated(x$Date2)&!duplicated(x$Value))

But not working.

Upvotes: 0

Views: 146

Answers (1)

akrun
akrun

Reputation: 887108

It seems to be a data.table. After grouping by 'Ref.', unlist the Subset of data.table (.SD), if the length of unlisted elements are equal to the length of unique elements (uniqueN), then get the Subset of Data.table

library(data.table)
dt[, {
        un <- unlist(.SD)
        if(length(un) == uniqueN(un)) .SD
      }, Ref.]

NOTE: Here we assume that all the values are distinct for each 'Ref.' i.e. even 'Date1' and 'Date2' values are not the same.

If the condition is to only check the unique values in each column and not across all the columns

dt[dt[, .I[Reduce(`&`, lapply(.SD, function(x) uniqueN(x) == 1))], Ref.]$V1]

With dplyr, we can use filter_all

library(dplyr)
dt %>% 
   group_by(Ref.) %>%
   filter_all(all_vars(n_distinct(.) == n()))

Upvotes: 1

Related Questions