Reputation: 569
I am working with a dataframe that looks like this:
date<-c("2012-02-01", "2012-02-01", "2012-02-03", "2012-02-04", "2012-02-04", "2012-02-05", "2012-02-09", "2012-02-12", "2012-02-12")
var<-c("a","b","c","d","e","f","g","h","i")
df1<-data.frame(date,var)
I would like to create a second dataframe that will tabulate the number of observations I have each day. In that dataframe, the dates that are not mentioned would get a zero...resulting in something like this:
date<-c("2012-02-01","2012-02-02","2012-02-03","2012-02-04","2012-02-05","2012-02-06","2012-02-07","2012-02-08","2012-02-09","2012-02-10","2012-02-11","2012-02-12")
num<-c(2,0,1,2,1,0,0,0,1,0,0,2)
df2<-data.frame(date,num)
I have tried a number of things with the aggregate function, but can't figure out how to include the dates with no observations (the zeros).
Upvotes: 7
Views: 2029
Reputation: 267
Get your index into Postxct format, then:
counts <- data.frame(table(as.Date(index(my_data_frame))))
Upvotes: 0
Reputation: 482
I recently dealt with something like this. I would create a data frame with all of the dates you want to consider and use the merge()
function to do what you are suggesting.
df1$date <- as.Date(df1$date, format = "%Y-%m-%d")
newdates <- data.frame(date=seq(as.Date('2012-02-01'),as.Date('2012-02-12'),1))
df2 <- merge(df1, newdates, by = "date", all = TRUE)
The all = TRUE
is crucial here, it introduces NA
s where df1
and df2
don't match up instead of deleting these instances.
Then use the plyr
package to get counts:
library(plyr)
ddply(df2, "date", function(x) sum(!is.na(x$var)))
This splits df2
into groups by unique values of df2$date
, then finds how many values of df2$var
were not NA
, then returns that number along with the unique value of df2$date
it represents.
Upvotes: 0
Reputation: 115392
Here an approach using data.table
library(data.table)
DF1 <- as.data.table(df1)
# coerce date to a date object
DF1[, date := as.IDate(as.character(date), format = '%Y-%m-%d')]
# setkey for joining
setkey(DF1, date)
# create a data.table that matches with a data.table containing
# a sequence from the minimum date to the maximum date
# nomatch = NA includes those non-matching.
# .N is the number of rows in the subset data.frame
# this is 0 when there are no matches
DF2 <- DF1[J(DF1[,seq(min(date), max(date), by = 1)]), .N, nomatch = NA]
DF2
date N
1: 2012-02-01 2
2: 2012-02-02 0
3: 2012-02-03 1
4: 2012-02-04 2
5: 2012-02-05 1
6: 2012-02-06 0
7: 2012-02-07 0
8: 2012-02-08 0
9: 2012-02-09 1
10: 2012-02-10 0
11: 2012-02-11 0
12: 2012-02-12 2
An approach using reshape2::dcast
If you ensure that your date
column has levels for every day that you wish to tabulate
df1$date <- with(df1, factor(date, levels = as.character(seq(min(as.Date(as.character(date))), max(as.Date(as.character(date))), by = 1 ))))
df2 <- dcast(df1, date~., drop = FALSE)
Upvotes: 2