Gosia
Gosia

Reputation: 33

How to create timeseries by grouping entries in R?

I want to create a time series from 01/01/2004 until 31/12/2010 of daily mortality data in R. The raw data that I have now (.csv file), has as columns day - month - year and every row is a death case. So if the mortality on a certain day is for example equal to four, there are four rows with that date. If there is no death case reported on a specific day, that day is omitted in the dataset.

What I need is a time-series with 2557 rows (from 01/01/2004 until 31/12/2010) wherein the total number of death cases per day is listed. If there is no death case on a certain day, I still need that day to be in the list with a "0" assigned to it.

Does anyone know how to do this?

Thanks, Gosia

Example of the raw data:

day month   year
1   1   2004
3   1   2004
3   1   2004
3   1   2004
6   1   2004
7   1   2004

What I need:

day month   year    deaths
1   1   2004    1
2   1   2004    0
3   1   2004    3
4   1   2004    0
5   1   2004    0
6   1   2004    1

Upvotes: 3

Views: 443

Answers (1)

Roland
Roland

Reputation: 132676

df <- read.table(text="day month   year
1   1   2004
3   1   2004
3   1   2004
3   1   2004
6   1   2004
7   1   2004",header=TRUE)

#transform to dates
dates <- as.Date(with(df,paste(year,month,day,sep="-")))

#contingency table
tab <- as.data.frame(table(dates))
names(tab)[2] <- "deaths"
tab$dates <- as.Date(tab$dates)

#sequence of dates
res <- data.frame(dates=seq(from=min(dates),to=max(dates),by="1 day"))
#merge
res <- merge(res,tab,by="dates",all.x=TRUE)
res[is.na(res$deaths),"deaths"] <- 0
res
#       dates deaths
#1 2004-01-01      1
#2 2004-01-02      0
#3 2004-01-03      3
#4 2004-01-04      0
#5 2004-01-05      0
#6 2004-01-06      1
#7 2004-01-07      1

Upvotes: 3

Related Questions