dcasblue
dcasblue

Reputation: 15

Convert yearly to monthly data in r

I have a yearly data that has value changes in the middle of the year. I would like to make it into a monthly data showing the change in the value. Here is snippet of my data.

year    value   Start date  End date
1985    35451   7/1/1985            3/20/1986
1986    45600   3/21/1986   12/23/1986
1987    46089   1/1/1987            10/31/1989

I would like to have all moths in columns and years in rows (something like below but without the break after Jun):

Jan Feb Mar Apr May Jun
1985    0   0   0   0   0   0
1986    35451   35451   38725   45600   45600   45600

Jul Aug Sep Oct Nov Dec
1985    35451   35451   35451   35451   35451   35451
1986    45600   45600   45600   45600   45600   45726

Months March and Dec of 1986 have weighted average values since the change in values occur during the month.

Thank you and appreciate it.

Upvotes: 0

Views: 1288

Answers (1)

alistaire
alistaire

Reputation: 43364

Really all you need here is seq.Date and xtabs (or your favorite variant), but there's a lot of munging that needs to be done to make it work. Here in Hadleyverse packages, but rewrite in base or data.table if you prefer:

library(dplyr)
library(tidyr)
library(lubridate)

       # Format dates as dates, then,
df %>% mutate_each(funs(mdy), ends_with('date')) %>% 
    # evaluating each row separately,
    rowwise() %>% 
    # create a list column with a month-wise sequence of dates for each.
    mutate(month = list(seq.Date(Start.date, End.date, by = 'month'))) %>%
    # Expand list column to long form,
    unnest() %>% 
    # change year column to year of sequence, not label, and reduce month column to month.abb.
    mutate(year = year(month), month = month(month, label = TRUE)) %>% 
    # For each year-month combination,
    group_by(year, month) %>% 
    # take the mean of values, so each has only one row, then
    summarise(value = mean(value)) %>% 
    # spread the result to wide form.
    spread(month, value, fill = 0)    # or xtabs(value ~ year + month, data = .)

# Source: local data frame [5 x 13]
# Groups: year [5]
# 
#    year   Jan   Feb     Mar   Apr   May   Jun   Jul   Aug   Sep   Oct   Nov   Dec
#   (dbl) (dbl) (dbl)   (dbl) (dbl) (dbl) (dbl) (dbl) (dbl) (dbl) (dbl) (dbl) (dbl)
# 1  1985     0     0     0.0     0     0     0 35451 35451 35451 35451 35451 35451
# 2  1986 35451 35451 40525.5 45600 45600 45600 45600 45600 45600 45600 45600 45600
# 3  1987 46089 46089 46089.0 46089 46089 46089 46089 46089 46089 46089 46089 46089
# 4  1988 46089 46089 46089.0 46089 46089 46089 46089 46089 46089 46089 46089 46089
# 5  1989 46089 46089 46089.0 46089 46089 46089 46089 46089 46089 46089     0     0

Upvotes: 1

Related Questions