Reputation: 911
I have a dataset that looks like the sample below. The MonthYear variable just describes the first occurrence of the UniqueID. The Count1 column counts the number of occurrences of the UniqueID. The Count2 column counts the total number of occurrences of each UniqueID. The MonthCount variable just assigns a number based on the month. I started collecting this information in October 2018, so that value would get a 1 and then November 2018 would get a 2, and so on.
Note: No data was available in both July 2019 and October 2019, so August 2019 gets a value of 10 and November 2019 gets a value of 12.
UniqueID Region City MonthYear Count1 Count2 MonthCount
ABC123 West AAA OCT-18 1 4 1
ABC123 West AAA NOV-18 2 4 2
ABC123 West AAA DEC-18 3 4 3
ABC123 West AAA JAN-19 4 4 4
DEF456 East BBB DEC-18 1 3 3
DEF456 East BBB JAN-19 2 3 4
DEF456 East BBB MAR-19 3 3 6
GHI789 East CCC JAN-19 1 4 4
GHI789 East CCC FEB-19 2 4 5
GHI789 East CCC APR-19 3 4 7
GHI789 East CCC JUN-19 4 4 9
JKL012 South DDD AUG-19 1 4 10
JKL012 South DDD SEP-19 2 4 11
JKL012 South DDD NOV-19 3 4 12
JKL012 South DDD DEC-19 4 4 13
What I want to do is count the total number of times a month is skipped per UniqueID (with the exception of JUL19 and OCT19). I would have a dataset that looks like the following:
UniqueID Region City MonthYear Count1 Count2 MonthCount Skipped
ABC123 West AAA OCT-18 1 4 1 0
ABC123 West AAA NOV-18 2 4 2 0
ABC123 West AAA DEC-18 3 4 3 0
ABC123 West AAA JAN-19 4 4 4 0
DEF456 East BBB DEC-18 1 3 3 1
DEF456 East BBB JAN-19 2 3 4 1
DEF456 East BBB MAR-19 3 3 6 1
GHI789 East CCC JAN-19 1 4 4 2
GHI789 East CCC FEB-19 2 4 5 2
GHI789 East CCC APR-19 3 4 7 2
GHI789 East CCC JUN-19 4 4 9 2
JKL012 South DDD AUG-19 1 4 10 0
JKL012 South DDD SEP-19 2 4 11 0
JKL012 South DDD NOV-19 3 4 12 0
JKL012 South DDD DEC-19 4 4 13 0
Any help would be appreciated! I'm not sure where to start. Thank you!
Upvotes: 2
Views: 75
Reputation: 887148
After grouping by 'UniqueID', can get the diff
of the 'MonthCount', check if any value is greater than 1 i.e. there is a difference in adjacent months greater than 1, and sum
the logical vector
df1 %>%
group_by(UniqueID) %>%
mutate(Skipped = sum(diff(MonthCount) > 1))
# A tibble: 15 x 8
# Groups: UniqueID [4]
# UniqueID Region City MonthYear Count1 Count2 MonthCount Skipped
# <chr> <chr> <chr> <chr> <int> <int> <int> <int>
# 1 ABC123 West AAA OCT-18 1 4 1 0
# 2 ABC123 West AAA NOV-18 2 4 2 0
# 3 ABC123 West AAA DEC-18 3 4 3 0
# 4 ABC123 West AAA JAN-19 4 4 4 0
# 5 DEF456 East BBB DEC-18 1 3 3 1
# 6 DEF456 East BBB JAN-19 2 3 4 1
# 7 DEF456 East BBB MAR-19 3 3 6 1
# 8 GHI789 East CCC JAN-19 1 4 4 2
# 9 GHI789 East CCC FEB-19 2 4 5 2
#10 GHI789 East CCC APR-19 3 4 7 2
#11 GHI789 East CCC JUN-19 4 4 9 2
#12 JKL012 South DDD AUG-19 1 4 10 0
#13 JKL012 South DDD SEP-19 2 4 11 0
#14 JKL012 South DDD NOV-19 3 4 12 0
#15 JKL012 South DDD DEC-19 4 4 13 0
df1 <- structure(list(UniqueID = c("ABC123", "ABC123", "ABC123", "ABC123",
"DEF456", "DEF456", "DEF456", "GHI789", "GHI789", "GHI789", "GHI789",
"JKL012", "JKL012", "JKL012", "JKL012"), Region = c("West", "West",
"West", "West", "East", "East", "East", "East", "East", "East",
"East", "South", "South", "South", "South"), City = c("AAA",
"AAA", "AAA", "AAA", "BBB", "BBB", "BBB", "CCC", "CCC", "CCC",
"CCC", "DDD", "DDD", "DDD", "DDD"), MonthYear = c("OCT-18", "NOV-18",
"DEC-18", "JAN-19", "DEC-18", "JAN-19", "MAR-19", "JAN-19", "FEB-19",
"APR-19", "JUN-19", "AUG-19", "SEP-19", "NOV-19", "DEC-19"),
Count1 = c(1L, 2L, 3L, 4L, 1L, 2L, 3L, 1L, 2L, 3L, 4L, 1L,
2L, 3L, 4L), Count2 = c(4L, 4L, 4L, 4L, 3L, 3L, 3L, 4L, 4L,
4L, 4L, 4L, 4L, 4L, 4L), MonthCount = c(1L, 2L, 3L, 4L, 3L,
4L, 6L, 4L, 5L, 7L, 9L, 10L, 11L, 12L, 13L)), class = "data.frame",
row.names = c(NA,
-15L))
Upvotes: 1