Reputation: 974
I want to aggregate group2 based on NAs in group1:
Datetime group1 group2
2011-08-08 21:00:00 1 1
2011-08-08 21:10:00 NA 2
2011-08-08 21:20:00 NA 3
2011-08-08 21:30:00 2 4
2011-08-08 21:40:00 NA 5
2011-08-08 21:50:00 NA 6
2011-08-08 22:00:00 3 7
This is my desired output:
Datetime group1 group2
2011-08-08 21:00:00 1 1
2011-08-08 21:30:00 2 9
2011-08-08 22:00:00 3 18
Edit: 9=2+3+4 and 18=5+6+7.
aggregate(group2~group1, data=Data, subset(Data,group1==NA),sum)
Any suggestion is appreciated. Can I do it with aggregate? or should I use different package?
Upvotes: 2
Views: 121
Reputation: 99321
It looks like na.locf
from package zoo
would be quite useful here.
Assuming dat
is your original data, we can take the dates for the non-NA group1
levels and use cbind
to bring them together with the aggregated group2
data.
> library(zoo)
> Datetime <- dat$Datetime[!is.na(dat$group1)]
> cbind(Datetime, aggregate(group2~group1, na.locf(dat, fromLast = TRUE), sum))
# Datetime group1 group2
# 1 2011-08-08 21:00:00 1 1
# 2 2011-08-08 21:30:00 2 9
# 3 2011-08-08 22:00:00 3 18
PS: Thanks for updating/editing your question (+1).
Upvotes: 2
Reputation: 24535
A solution using base R:
ddf = structure(list(Date = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "2011-08-08", class = "factor"),
time = structure(1:7, .Label = c("21:00:00", "21:10:00",
"21:20:00", "21:30:00", "21:40:00", "21:50:00", "22:00:00"
), class = "factor"), group1 = c(1L, NA, NA, 2L, NA, NA,
3L), group2 = 1:7), .Names = c("Date", "time", "group1",
"group2"), class = "data.frame", row.names = c(NA, -7L))
ddf$group1a = ddf$group1
for(i in nrow(ddf):1)
if(is.na(ddf$group1a[i]))
ddf$group1a[i] = ddf$group1a[i+1]
outdf = stack(with(ddf, tapply(group2, group1a, sum)))
names(outdf) = c("group2","group1")
outdf = outdf[,c(2,1)]
outdf
# group1 group2
#1 1 1
#2 2 9
#3 3 18
Upvotes: 1
Reputation: 886938
Using data.table
library(data.table)
DT1 <- DT[, group1:=cumsum(!is.na(c(0, group1[1:(.N-1)])))][,
list(Datetime=Datetime[.N],group2=sum(group2)), by=group1][,c(2,1,3), with=FALSE]
DT1
# Datetime group1 group2
#1: 2011-08-08 21:00:00 1 1
#2: 2011-08-08 21:30:00 2 9
#3: 2011-08-08 22:00:00 3 18
dat <- structure(list(Datetime = c("2011-08-08 21:00:00", "2011-08-08 21:10:00",
"2011-08-08 21:20:00", "2011-08-08 21:30:00", "2011-08-08 21:40:00",
"2011-08-08 21:50:00", "2011-08-08 22:00:00"), group1 = c(1L,
NA, NA, 2L, NA, NA, 3L), group2 = 1:7), .Names = c("Datetime",
"group1", "group2"), class = "data.frame", row.names = c(NA,
-7L))
DT <- data.table(dat)
Upvotes: 0