Red Vibes
Red Vibes

Reputation: 79

Filter rows based on certain conditions in R

I'm just a beginner in R and I am unable to think a very complex logic. Let me tell u

I have my data

Serial no Invoice no Date received
0C315 17-jun-20 03-Aug-20
0C8AA 26-jun-20 03-Aug-20
0C8B5 26-jun-20 11-Aug-20
0C8C0 28-jun-20 11-Aug-20
0C8CB 18-jun-20 03-Aug-20
0C8D6 18-jun-20 11-Aug-20
0C8E1 20-jun-20 24-Aug-20
0C8EC 24-jun-20 24-Aug-20
0C8F7 26-jun-20 24-Aug-20

What I have to do is...invoice date is basically manufactured date and date received is the recieved date, the product whose manufactured date is earlier that product must be received earlier, simply first come first served. Now I have to list the details of those products whose manufactured date is earlier but recieved later. For this data... In row 1... invoice date is 17 Jun 20 and received on 3 aug 20 which is fine because 03 aug 20 is the earliest among all rows

In 6 th row (count row no. exclude column name)-> 0C8D6...

invoice date - 18 Jun 20 and

received on 11 aug 20.

Now compare this with evry rows..you will find in the 2nd row

invoice date is 26 Jun 20 but recieved on 3 aug 20(manufactured late and recieved earlier than row 6th)

which is not supposed to be, 6 th row should be received earlier not the 2nd row.So 6 th row should be listed down.

Same for last row

Manufactured on - 26 Jun 20

Recieved on 24 Aug 20

But on comparing, found a row which is 4th row....

mfd on 28 Jun 20

And recieved on 11 aug 20(manufactured late but recieved earlier than last row)

Last row should be listed down

My output should be

Serial no Invoice no Date received
0C8D6 18-jun-20 11-Aug-20
0C8E1 20-jun-20 24-Aug-20
0C8EC 24-jun-20 24-Aug-20
0C8F7 26-jun-20 24-Aug-20

I hope I have made clear what my problem is...let me know if you have any doubt regarding the explanation. Please help me to code the problem

Thanks in advance

Upvotes: 0

Views: 67

Answers (1)

lead-y
lead-y

Reputation: 66

First, you'll need to get your table into a dataframe/tibble with legal column names. If you're not familiar with dates in R, look up as.Date().

Then you can do something like this:

library(dplyr)

is_this_order_late <- function(df, row) {
  this_date_ordered <- df[row, "Invoice.no"]
  this_date_recieved <- df[row, "Date.received"]
  # Orders which were placed after this one but were received earlier
  quicker_orders <- df %>%
    filter(
      Invoice.no > this_date_ordered, Date.received < this_date_recieved
    )
  # If there were no orders that were placed after this one but received earlier, return FALSE;
  # Otherise, return TRUE
  is_late <- as.logical(nrow(quicker_orders))
  return(is_late)
}

df[sapply(1:nrow(df), function(row) {is_this_order_late(df, row)}),]

I made a function that takes in a row and determines whether or not the order corresponding to that row was received late (according to the condition you specified). Then I applied that function to every row in your dataframe using sapply(), and then filtered the dataframe using a boolean mask.

If you're not already familiar with them, you may want to look up boolean masks, dplyr's filter(), and the sapply()/lapply() family of functions in R.

Upvotes: 1

Related Questions