person10559
person10559

Reputation: 57

R Increasing Variable Based on Previous Occurrences

I have a data frame of restaurant inspections ordered by date. For each observation, I want to add two additional variables to note how many total inspections this restaurant has had, and how many times they have failed. I would like to avoid using a for loop but I'm not sure how else to do this. Essentially, I currently have a data frame consisting of the first three columns of the data frame below and I would like to add the last two columns.

Initial data frame

    Restaurant_ID    Date         Result
    1                01/02/2011   Pass 
    2                02/05/2011   Pass
    3                04/07/2011   Fail
    1                09/05/2011   Fail
    2                03/13/2012   Pass
    1                08/25/2012   Fail
    2                09/25/2012   Pass
    3                01/05/2013   Pass

Desired output 1

Restaurant_ID    Date         Result   total_inspect  failed_inspect
1                01/02/2011   Pass     1              0
2                02/05/2011   Pass     1              0
3                04/07/2011   Fail     1              1
1                09/05/2011   Fail     2              1
2                03/13/2012   Pass     2              0
1                08/25/2012   Fail     3              2
2                09/25/2012   Pass     3              0
3                01/05/2013   Pass     2              1

EDIT: I've realized that I actually want the last two columns to reflect the number of total and failed inspections prior to the current observation. So what I actually want is

Desired output 2

    Restaurant_ID    Date         Result   past_inspect  past_failed_inspect
    1                01/02/2011   Pass     0              0
    2                02/05/2011   Pass     0              0
    3                04/07/2011   Fail     0              0
    1                09/05/2011   Fail     1              0
    2                03/13/2012   Pass     1              0
    1                08/25/2012   Fail     2              1
    2                09/25/2012   Pass     2              0
    3                01/05/2013   Pass     1              1

Upvotes: 0

Views: 70

Answers (1)

www
www

Reputation: 39154

This solution uses functions from the package tidyverse and lubridate.

# Create the example data frame
dt1 <- read.csv(text = "Restaurant_ID,Date,Result
1,01/02/2011,Pass
2,02/05/2011,Pass
3,04/07/2011,Fail
1,09/05/2011,Fail
2,03/13/2012,Pass
1,08/25/2012,Fail
2,09/25/2012,Pass
               3,01/05/2013,Pass",
               stringsAsFactors = FALSE)

# Load packages
library(tidyverse)
library(lubridate)

dt2 <- dt1 %>%
  # Convert the Date column to Date class
  mutate(Date = mdy(Date)) %>%
  # Sort the data frame based on Restaurant_ID and Date
  arrange(Restaurant_ID, Date) %>%
  # group the data by each restaurant ID
  group_by(Restaurant_ID) %>%
  # Create a column showing total_inspect
  mutate(total_inspect = 1:n()) %>%
  # Create a column showing fail_result, fail is 1, pass is 0
  mutate(fail_result = ifelse(Result == "Fail", 1, 0)) %>%
  # Calculate the cumulative sum of fail_result
  mutate(failed_inspect = cumsum(fail_result)) %>%
  # Remove fail_result
  select(-fail_result) %>%
  # Sort the data frame by Date
  arrange(Date)

Edit: Calculate the past inspection and fail count

dt3 <- dt2 %>%
  mutate(past_inspect = ifelse(total_inspect == 0, 0, total_inspect - 1)) %>%
  mutate(past_failed_inspect = ifelse(Result == "Fail" & failed_inspect != 0, 
                                      failed_inspect - 1,
                                      failed_inspect)) %>%
  select(-total_inspect, -failed_inspect)

Upvotes: 3

Related Questions