Reputation: 25
I have a dataframe. Part of it looks like this:
PN Supplier Date1 Q_QTY Price B_QTY Date2 1023242 1 6/2/2017 1 9.32 250 7/1/2019 1023242 1 6/2/2017 1 8.81 100 5/3/2017 1023242 1 6/2/2017 1 9.50 200 7/1/2019 1023242 2 "" 1 152.26 1 4/9/2019 1023242 1 "" 0 9.32 250 7/1/2019 1023242 1 "" 0 8.81 100 5/3/2017 1023242 1 "" 0 9.50 200 7/1/2019
There are many PN, each PN can be manufactured by different suppliers. For some reasons, there are some "duplicate" records which have all the columns the same except "Date1", like the first row and the fifth row.
I wish the output only includes rows that have no "" in Date1 Column based on the same PN and suppliers, and also includes the row that has the same PN but different supplier, though the Date1 is "".
This is my desired output looks like:
PN Supplier Date1 Q_QTY Price B_QTY Date2 1023242 1 6/2/2017 1 9.32 250 7/1/2019 1023242 1 6/2/2017 1 8.81 100 5/3/2017 1023242 1 6/2/2017 1 9.50 200 7/1/2019 1023242 2 "" 1 152.26 1 4/9/2019
Upvotes: 1
Views: 45
Reputation: 887028
We could use distinct
library(dplyr)
df1 %>%
distinct(PN, Supplier, Price, B_QTY, .keep_all = TRUE)
# PN Supplier Date1 Q_QTY Price B_QTY Date2
#1 1023242 1 6/2/2017 1 9.32 250 7/1/2019
#2 1023242 1 6/2/2017 1 8.81 100 5/3/2017
#3 1023242 1 6/2/2017 1 9.50 200 7/1/2019
#4 1023242 2 1 152.26 1 4/9/2019
Or with duplicated
from base R
df1[!duplicated(df1[c(1:2, 5:6)]),]
df1 <- structure(list(PN = c(1023242L, 1023242L, 1023242L, 1023242L,
1023242L, 1023242L, 1023242L), Supplier = c(1L, 1L, 1L, 2L, 1L,
1L, 1L), Date1 = c("6/2/2017", "6/2/2017", "6/2/2017", "", "",
"", ""), Q_QTY = c(1L, 1L, 1L, 1L, 0L, 0L, 0L), Price = c(9.32,
8.81, 9.5, 152.26, 9.32, 8.81, 9.5), B_QTY = c(250L, 100L, 200L,
1L, 250L, 100L, 200L), Date2 = c("7/1/2019", "5/3/2017", "7/1/2019",
"4/9/2019", "7/1/2019", "5/3/2017", "7/1/2019")),
class = "data.frame", row.names = c(NA,
-7L))
Upvotes: 2