Reputation: 41
I am attempting to remove duplicate rows based upon the presence of a factor variable. If the factor variable in the duplicated row shows desired, rather than not desired, I would like to keep that row and remove the other row. The factor desired will sometimes appear as the first duplicate and sometimes as the second.
In addition, there is a column that begins counting for thirty days once either desired or not desired pops up. In the absence of type (NA), the duplicate flag column will also show NA.
In the end, there should be 1 row per brand per day.
A sample of the data at hand:
brand date sales orders customers type duplicate_flag
A 10/1/2018 100 5 4 NA NA
A 10/2/2018 150 8 6 desired 1
A 10/2/2018 150 8 6 not desired 1
A 10/3/2018 110 5 4 NA 2
Desired output:
brand date sales orders customers type duplicate_flag
A 10/1/2018 100 5 4 NA NA
A 10/2/2018 150 8 6 desired 1
A 10/3/2018 110 5 4 NA 2
If there is a way to do this in dplyr, that would be great.
Thank you!
Upvotes: 0
Views: 890
Reputation: 9570
Here are some usable sample data.
df <-
data_frame(
Date = c(1,2,2,3,3,4)
, Metric = 1:6
, type = c(NA, "desired", "not desired", "not desired", "desired", "not desired")
)
Which looks like:
# A tibble: 6 x 3
Date Metric type
<dbl> <int> <chr>
1 1 1 <NA>
2 2 2 desired
3 2 3 not desired
4 3 4 not desired
5 3 5 desired
6 4 6 not desired
I am assuming that you want to keep one row per date, based on the type
column, but that the other columns may (or may not) differ from each other. (If they never differ from each other, I don't see why it would matter which row you keep.)
For that, the simplest is probably to sort the data by type
(ensuring that the value you want to keep comes first -- you may have to change type
to a factor with the "desired" value as the first level if it is not the first alphabetically for some reason) then use slice
to keep the first entry.
df %>%
arrange(type) %>%
group_by(Date) %>%
slice(1) %>%
ungroup() %>%
arrange(Date)
returns:
# A tibble: 4 x 3
Date Metric type
<dbl> <int> <chr>
1 1 1 <NA>
2 2 2 desired
3 3 5 desired
4 4 6 not desired
Upvotes: 2
Reputation: 667
I assume your dataframe is "df"
df %>% filter(type != "not desired" | is.na(type))
Or
df %>% select(-type) %>% distinct()
Upvotes: 0