Fangyuan
Fangyuan

Reputation: 25

R-Filter Out Data Based on the Multiple Conditions

I have a dataframe. Part of it looks like this:

 PN       Supplier    Date1    Q_QTY  Price  B_QTY     Date2
1023242    1        6/2/2017    1     9.32    250     7/1/2019
1023242    1        6/2/2017    1     8.81    100     5/3/2017
1023242    1        6/2/2017    1     9.50    200     7/1/2019
1023242    2        ""          1     152.26   1      4/9/2019
1023242    1        ""          0     9.32    250     7/1/2019
1023242    1        ""          0     8.81    100     5/3/2017
1023242    1        ""          0     9.50    200     7/1/2019

There are many PN, each PN can be manufactured by different suppliers. For some reasons, there are some "duplicate" records which have all the columns the same except "Date1", like the first row and the fifth row.

I wish the output only includes rows that have no "" in Date1 Column based on the same PN and suppliers, and also includes the row that has the same PN but different supplier, though the Date1 is "".

This is my desired output looks like:

 PN       Supplier    Date1    Q_QTY  Price  B_QTY     Date2
1023242    1        6/2/2017    1     9.32    250     7/1/2019
1023242    1        6/2/2017    1     8.81    100     5/3/2017
1023242    1        6/2/2017    1     9.50    200     7/1/2019
1023242    2        ""          1     152.26   1      4/9/2019

Upvotes: 1

Views: 45

Answers (1)

akrun
akrun

Reputation: 887028

We could use distinct

library(dplyr)
df1 %>%
    distinct(PN, Supplier, Price, B_QTY, .keep_all = TRUE)
#   PN Supplier    Date1 Q_QTY  Price B_QTY    Date2
#1 1023242        1 6/2/2017     1   9.32   250 7/1/2019
#2 1023242        1 6/2/2017     1   8.81   100 5/3/2017
#3 1023242        1 6/2/2017     1   9.50   200 7/1/2019
#4 1023242        2              1 152.26     1 4/9/2019

Or with duplicated from base R

df1[!duplicated(df1[c(1:2, 5:6)]),]

data

df1 <- structure(list(PN = c(1023242L, 1023242L, 1023242L, 1023242L, 
1023242L, 1023242L, 1023242L), Supplier = c(1L, 1L, 1L, 2L, 1L, 
1L, 1L), Date1 = c("6/2/2017", "6/2/2017", "6/2/2017", "", "", 
"", ""), Q_QTY = c(1L, 1L, 1L, 1L, 0L, 0L, 0L), Price = c(9.32, 
8.81, 9.5, 152.26, 9.32, 8.81, 9.5), B_QTY = c(250L, 100L, 200L, 
1L, 250L, 100L, 200L), Date2 = c("7/1/2019", "5/3/2017", "7/1/2019", 
"4/9/2019", "7/1/2019", "5/3/2017", "7/1/2019")), 
class = "data.frame", row.names = c(NA, 
-7L))

Upvotes: 2

Related Questions