Reputation: 928
I have a large-ish dataset with a structure something like the following:
structure(list(pathogen = c("MRSA", "L. pneumophila", "MRSA",
"L. pneumophila", "MRSA", "MRSA", "MRSA", "L. pneumophila", "L. pneumophila",
"MRSA"), variant = c("mecA", "sg1", "mecA", "sg1", "mecA", "mecC",
"mecA", "sg1", "sg6", "mecA"), n = c(25L, 14L, 235L, 2L, 64L,
15L, 13L, 6L, 11L, 8L), date = structure(c(15156, 15248, 15279,
15279, 15309, 15340, 15340, 15400, 15431, 15461), class = "Date")), .Names = c("pathogen",
"variant", "n", "date"), row.names = c(NA, -10L), class = "data.frame")
I want to find every row with a combination of variables that has not been recorded in the previous x-month period. So when I look for a combination of pathogen
and variant
that has not been recorded in the previous 3 months I go from:
pathogen variant n date
1 MRSA mecA 25 2011-07-01
2 L. pneumophila sg1 14 2011-10-01
3 MRSA mecA 235 2011-11-01
4 L. pneumophila sg1 2 2011-11-01
5 MRSA mecA 64 2011-12-01
6 MRSA mecC 15 2012-01-01
7 MRSA mecA 13 2012-01-01
8 L. pneumophila sg1 6 2012-03-01
9 L. pneumophila sg6 11 2012-04-01
10 MRSA mecA 8 2012-05-01
to:
pathogen variant n date
1 MRSA mecA 25 2011-07-01
2 L. pneumophila sg1 14 2011-10-01
3 MRSA mecA 235 2011-11-01
6 MRSA mecC 15 2012-01-01
8 L. pneumophila sg1 6 2012-03-01
9 L. pneumophila sg6 11 2012-04-01
10 MRSA mecA 8 2012-05-01
All the solutions I've thought of so far involve writing loops. I'm also trying to use dplyr for analysis as much as possible, so my question is: is this possible in dplyr? And if not, what would an R-ish approach look like?
Upvotes: 3
Views: 215
Reputation: 70256
I'm not sure about the best way to handle exact month intervalls, but to get you started, you could compute the difference in number of days as follows (resulting in the same output as shown in the question):
df %>%
group_by(pathogen, variant) %>%
filter(c(TRUE, diff(date) > 90)) # check for difference of 90 days
#Source: local data frame [7 x 4]
#Groups: pathogen, variant
#
# pathogen variant n date
#1 MRSA mecA 25 2011-07-01
#2 L. pneumophila sg1 14 2011-10-01
#3 MRSA mecA 235 2011-11-01
#4 MRSA mecC 15 2012-01-01
#5 L. pneumophila sg1 6 2012-03-01
#6 L. pneumophila sg6 11 2012-04-01
#7 MRSA mecA 8 2012-05-01
Upvotes: 3