Reputation: 1866
I have a data frame, much like this one:
ref=rep(c("A","B"),each=240)
year=rep(rep(2014:2015,each=120),2)
month=rep(rep(1:12,each=10),4)
values=c(rep(NA,200),rnorm(100,2,1),rep(NA,50),rnorm(40,4,2),rep(NA,90))
DF=data.frame(ref,year,month,values)
I would like to compute the maximum number of consecutive NAs per reference, per year. I have created a function, which works out the maximum number of consecutive NAs, but can only be based on one variable.
For example,
func <- function(x) {
max(rle(is.na(x))$lengths)
}
with(DF, tapply(values,ref, func))
# A B
# 200 90
with(DF, tapply(values,year, func))
# 2014 2015
# 120 90
So there are a maximum of 200 consecutive NAs in ref A in total, and maximum of 90 in ref B, which is correct. There are also 120 NAs in 2014, and 90 in 2015.
What I'd like is a result per ref and year, such as:
A 2015 80
A 2014 120
B 2015 90
B 2014 50
Upvotes: 0
Views: 1959
Reputation: 1399
Very similar to the tapply solution above. I find aggregate
give a better output than tapply
though.
with(DF, aggregate(list(Value = values),list(Year = year,ref = ref), func))
Year ref Value
1 2014 A 120
2 2015 A 80
3 2014 B 60
4 2015 B 90
Upvotes: 1
Reputation: 950
I like the recipe format
library(dplyr)
DF$values[is.na(DF$values)] <- 1
DF %>%
filter(values==1) %>%
group_by(ref,year) %>%
mutate(csum=cumsum(values)) %>%
group_by(ref,year) %>%
summarise(max(csum))
Source: local data frame [4 x 3]
Groups: ref [?]
ref year max(csum)
(fctr) (int) (dbl)
1 A 2014 120
2 A 2015 80
3 B 2014 50
4 B 2015 90
Upvotes: 0
Reputation: 1266
There are multiple ways of doing this, one is with the plyr library:
library(plyr)
ddply(DF,c('ref','year'),summarise,NAs=max(rle(is.na(values))$lengths))
ref year NAs
1 A 2014 120
2 A 2015 80
3 B 2014 60
4 B 2015 90
Using your function, you could also try:
with(DF, tapply(values,list(ref,year), func))
which gives a slightly different output
2014 2015
A 120 80
B 60 90
By using melt() you can however get to the same dataframe.
Upvotes: 1