Reputation: 123
I have a df like that:
> head(dt)
Ref. Date1 Date2 Value
1: 2018146 2018-01-25 2018-02-25 9281.85
2: 2018146 2018-01-25 2018-04-25 9281.85
3: 2018150 2018-01-25 2018-02-25 5865.48
4: 2018150 2018-01-25 2018-04-25 5865.48
5: 2018306 2018-01-31 2018-03-25 9596.81
6: 2010306 2018-01-31 2018-05-01 9596.81
I have Ref. duplicates because in the value Date, Date2 and Value there are some differences.
I want to keep only the Ref. that have differences between all the other columns:
for example:
I want to keep that rows:
Ref. Date1 Date2 Value
2016487 2017-05-01 2017-06-30 -5343.68
2016487 2016-04-29 2016-05-31 -7451.18
I want to skip that rows (just differing in one column not in all 3):
Ref. Date1 Date2 Value
2015778 2016-10-27 2016-12-26 -108.01
2015778 2016-10-27 2016-12-26 -108.00
I have tried to do that:
x %>% group_by(Invoice) %>%
filter(!duplicated(x$Date1)&!duplicated(x$Date2)&!duplicated(x$Value))
But not working.
Upvotes: 0
Views: 146
Reputation: 887108
It seems to be a data.table. After grouping by 'Ref.', unlist
the Subset of data.table (.SD
), if
the length
of unlisted elements are equal to the length
of unique
elements (uniqueN
), then get the Subset of Data.table
library(data.table)
dt[, {
un <- unlist(.SD)
if(length(un) == uniqueN(un)) .SD
}, Ref.]
NOTE: Here we assume that all the values are distinct for each 'Ref.' i.e. even 'Date1' and 'Date2' values are not the same.
If the condition is to only check the unique values in each column and not across all the columns
dt[dt[, .I[Reduce(`&`, lapply(.SD, function(x) uniqueN(x) == 1))], Ref.]$V1]
With dplyr
, we can use filter_all
library(dplyr)
dt %>%
group_by(Ref.) %>%
filter_all(all_vars(n_distinct(.) == n()))
Upvotes: 1