Reputation: 1866

tapply based on multiple indexes in R

I have a data frame, much like this one:

ref=rep(c("A","B"),each=240)
year=rep(rep(2014:2015,each=120),2)
month=rep(rep(1:12,each=10),4)
values=c(rep(NA,200),rnorm(100,2,1),rep(NA,50),rnorm(40,4,2),rep(NA,90))

DF=data.frame(ref,year,month,values)

I would like to compute the maximum number of consecutive NAs per reference, per year. I have created a function, which works out the maximum number of consecutive NAs, but can only be based on one variable.

For example,

func <-   function(x) { 
  max(rle(is.na(x))$lengths) 
} 

with(DF, tapply(values,ref, func))
# A   B 
# 200  90 

with(DF, tapply(values,year, func))
# 2014 2015 
# 120   90

So there are a maximum of 200 consecutive NAs in ref A in total, and maximum of 90 in ref B, which is correct. There are also 120 NAs in 2014, and 90 in 2015.

What I'd like is a result per ref and year, such as:

Upvotes: 0

Answers (3)

s_scolary

Reputation: 1399

Very similar to the tapply solution above. I find aggregate give a better output than tapply though.

with(DF, aggregate(list(Value = values),list(Year = year,ref = ref), func))

  Year  ref  Value
1 2014   A   120
2 2015   A    80
3 2014   B    60
4 2015   B    90

Upvotes: 1

Adam Birenbaum

Reputation: 950

I like the recipe format

library(dplyr)

DF$values[is.na(DF$values)] <- 1   

DF %>% 
 filter(values==1) %>%  
 group_by(ref,year) %>% 
 mutate(csum=cumsum(values)) %>% 
 group_by(ref,year) %>% 
 summarise(max(csum))

Source: local data frame [4 x 3]
Groups: ref [?]

     ref  year max(csum)
   (fctr) (int)     (dbl)
1      A  2014       120
2      A  2015        80
3      B  2014        50
4      B  2015        90

Upvotes: 0

Wave

Reputation: 1266

There are multiple ways of doing this, one is with the plyr library:

library(plyr)
ddply(DF,c('ref','year'),summarise,NAs=max(rle(is.na(values))$lengths))

  ref year NAs
1   A 2014 120
2   A 2015  80
3   B 2014  60
4   B 2015  90

Using your function, you could also try:

with(DF, tapply(values,list(ref,year), func))

which gives a slightly different output

  2014 2015
A  120   80
B   60   90

By using melt() you can however get to the same dataframe.

Upvotes: 1

tapply based on multiple indexes in R

Answers (3)

Related Questions