Reputation: 57
I have a data frame of restaurant inspections ordered by date. For each observation, I want to add two additional variables to note how many total inspections this restaurant has had, and how many times they have failed. I would like to avoid using a for loop but I'm not sure how else to do this. Essentially, I currently have a data frame consisting of the first three columns of the data frame below and I would like to add the last two columns.
Restaurant_ID Date Result
1 01/02/2011 Pass
2 02/05/2011 Pass
3 04/07/2011 Fail
1 09/05/2011 Fail
2 03/13/2012 Pass
1 08/25/2012 Fail
2 09/25/2012 Pass
3 01/05/2013 Pass
Restaurant_ID Date Result total_inspect failed_inspect
1 01/02/2011 Pass 1 0
2 02/05/2011 Pass 1 0
3 04/07/2011 Fail 1 1
1 09/05/2011 Fail 2 1
2 03/13/2012 Pass 2 0
1 08/25/2012 Fail 3 2
2 09/25/2012 Pass 3 0
3 01/05/2013 Pass 2 1
EDIT: I've realized that I actually want the last two columns to reflect the number of total and failed inspections prior to the current observation. So what I actually want is
Restaurant_ID Date Result past_inspect past_failed_inspect
1 01/02/2011 Pass 0 0
2 02/05/2011 Pass 0 0
3 04/07/2011 Fail 0 0
1 09/05/2011 Fail 1 0
2 03/13/2012 Pass 1 0
1 08/25/2012 Fail 2 1
2 09/25/2012 Pass 2 0
3 01/05/2013 Pass 1 1
Upvotes: 0
Views: 70
Reputation: 39154
This solution uses functions from the package tidyverse
and lubridate
.
# Create the example data frame
dt1 <- read.csv(text = "Restaurant_ID,Date,Result
1,01/02/2011,Pass
2,02/05/2011,Pass
3,04/07/2011,Fail
1,09/05/2011,Fail
2,03/13/2012,Pass
1,08/25/2012,Fail
2,09/25/2012,Pass
3,01/05/2013,Pass",
stringsAsFactors = FALSE)
# Load packages
library(tidyverse)
library(lubridate)
dt2 <- dt1 %>%
# Convert the Date column to Date class
mutate(Date = mdy(Date)) %>%
# Sort the data frame based on Restaurant_ID and Date
arrange(Restaurant_ID, Date) %>%
# group the data by each restaurant ID
group_by(Restaurant_ID) %>%
# Create a column showing total_inspect
mutate(total_inspect = 1:n()) %>%
# Create a column showing fail_result, fail is 1, pass is 0
mutate(fail_result = ifelse(Result == "Fail", 1, 0)) %>%
# Calculate the cumulative sum of fail_result
mutate(failed_inspect = cumsum(fail_result)) %>%
# Remove fail_result
select(-fail_result) %>%
# Sort the data frame by Date
arrange(Date)
dt3 <- dt2 %>%
mutate(past_inspect = ifelse(total_inspect == 0, 0, total_inspect - 1)) %>%
mutate(past_failed_inspect = ifelse(Result == "Fail" & failed_inspect != 0,
failed_inspect - 1,
failed_inspect)) %>%
select(-total_inspect, -failed_inspect)
Upvotes: 3