user2813606
user2813606

Reputation: 911

(R) Count Gaps in Between Months

I have a dataset that looks like the sample below. The MonthYear variable just describes the first occurrence of the UniqueID. The Count1 column counts the number of occurrences of the UniqueID. The Count2 column counts the total number of occurrences of each UniqueID. The MonthCount variable just assigns a number based on the month. I started collecting this information in October 2018, so that value would get a 1 and then November 2018 would get a 2, and so on.

Note: No data was available in both July 2019 and October 2019, so August 2019 gets a value of 10 and November 2019 gets a value of 12.

UniqueID Region City MonthYear Count1 Count2 MonthCount
ABC123   West   AAA  OCT-18    1      4      1
ABC123   West   AAA  NOV-18    2      4      2
ABC123   West   AAA  DEC-18    3      4      3 
ABC123   West   AAA  JAN-19    4      4      4
DEF456   East   BBB  DEC-18    1      3      3 
DEF456   East   BBB  JAN-19    2      3      4
DEF456   East   BBB  MAR-19    3      3      6
GHI789   East   CCC  JAN-19    1      4      4
GHI789   East   CCC  FEB-19    2      4      5
GHI789   East   CCC  APR-19    3      4      7
GHI789   East   CCC  JUN-19    4      4      9
JKL012   South  DDD  AUG-19    1      4      10 
JKL012   South  DDD  SEP-19    2      4      11
JKL012   South  DDD  NOV-19    3      4      12
JKL012   South  DDD  DEC-19    4      4      13 

What I want to do is count the total number of times a month is skipped per UniqueID (with the exception of JUL19 and OCT19). I would have a dataset that looks like the following:

UniqueID Region City MonthYear Count1 Count2 MonthCount Skipped
ABC123   West   AAA  OCT-18    1      4      1          0      
ABC123   West   AAA  NOV-18    2      4      2          0
ABC123   West   AAA  DEC-18    3      4      3          0
ABC123   West   AAA  JAN-19    4      4      4          0
DEF456   East   BBB  DEC-18    1      3      3          1
DEF456   East   BBB  JAN-19    2      3      4          1
DEF456   East   BBB  MAR-19    3      3      6          1 
GHI789   East   CCC  JAN-19    1      4      4          2
GHI789   East   CCC  FEB-19    2      4      5          2
GHI789   East   CCC  APR-19    3      4      7          2
GHI789   East   CCC  JUN-19    4      4      9          2
JKL012   South  DDD  AUG-19    1      4      10         0 
JKL012   South  DDD  SEP-19    2      4      11         0
JKL012   South  DDD  NOV-19    3      4      12         0
JKL012   South  DDD  DEC-19    4      4      13         0

Any help would be appreciated! I'm not sure where to start. Thank you!

Upvotes: 2

Views: 75

Answers (1)

akrun
akrun

Reputation: 887148

After grouping by 'UniqueID', can get the diff of the 'MonthCount', check if any value is greater than 1 i.e. there is a difference in adjacent months greater than 1, and sum the logical vector

df1 %>% 
   group_by(UniqueID) %>%
   mutate(Skipped = sum(diff(MonthCount) > 1))
# A tibble: 15 x 8
# Groups:   UniqueID [4]
#   UniqueID Region City  MonthYear Count1 Count2 MonthCount Skipped
#   <chr>    <chr>  <chr> <chr>      <int>  <int>      <int>   <int>
# 1 ABC123   West   AAA   OCT-18         1      4          1       0
# 2 ABC123   West   AAA   NOV-18         2      4          2       0
# 3 ABC123   West   AAA   DEC-18         3      4          3       0
# 4 ABC123   West   AAA   JAN-19         4      4          4       0
# 5 DEF456   East   BBB   DEC-18         1      3          3       1
# 6 DEF456   East   BBB   JAN-19         2      3          4       1
# 7 DEF456   East   BBB   MAR-19         3      3          6       1
# 8 GHI789   East   CCC   JAN-19         1      4          4       2
# 9 GHI789   East   CCC   FEB-19         2      4          5       2
#10 GHI789   East   CCC   APR-19         3      4          7       2
#11 GHI789   East   CCC   JUN-19         4      4          9       2
#12 JKL012   South  DDD   AUG-19         1      4         10       0
#13 JKL012   South  DDD   SEP-19         2      4         11       0
#14 JKL012   South  DDD   NOV-19         3      4         12       0
#15 JKL012   South  DDD   DEC-19         4      4         13       0

data

df1 <- structure(list(UniqueID = c("ABC123", "ABC123", "ABC123", "ABC123", 
"DEF456", "DEF456", "DEF456", "GHI789", "GHI789", "GHI789", "GHI789", 
"JKL012", "JKL012", "JKL012", "JKL012"), Region = c("West", "West", 
"West", "West", "East", "East", "East", "East", "East", "East", 
"East", "South", "South", "South", "South"), City = c("AAA", 
"AAA", "AAA", "AAA", "BBB", "BBB", "BBB", "CCC", "CCC", "CCC", 
"CCC", "DDD", "DDD", "DDD", "DDD"), MonthYear = c("OCT-18", "NOV-18", 
"DEC-18", "JAN-19", "DEC-18", "JAN-19", "MAR-19", "JAN-19", "FEB-19", 
"APR-19", "JUN-19", "AUG-19", "SEP-19", "NOV-19", "DEC-19"), 
    Count1 = c(1L, 2L, 3L, 4L, 1L, 2L, 3L, 1L, 2L, 3L, 4L, 1L, 
    2L, 3L, 4L), Count2 = c(4L, 4L, 4L, 4L, 3L, 3L, 3L, 4L, 4L, 
    4L, 4L, 4L, 4L, 4L, 4L), MonthCount = c(1L, 2L, 3L, 4L, 3L, 
    4L, 6L, 4L, 5L, 7L, 9L, 10L, 11L, 12L, 13L)), class = "data.frame",
    row.names = c(NA, 
-15L))

Upvotes: 1

Related Questions