NMe
NMe

Reputation: 31

R: Transforming incomplete timeseries into matrix

I have a monthly time series, that should be transformed into a matrix. (6 years, but the last 4 months are missing!)

       Jan      Feb      Mar      Apr      May      Jun      Jul      Aug      Sep      Oct      Nov      Dec
1 62.17416 66.34748 69.58154 64.92033 70.51246 70.83153 74.19475 72.29366 77.97062 71.37903 67.08182 69.10599
2 59.41197 61.44036 68.97711 72.43558 69.92715 74.20776 79.19150 71.48650 77.56661 67.44390 67.05218 66.43690
3 58.79278 63.26696 70.97837 71.79423 72.04037 71.61184 72.64526 70.38479 72.43492 64.42046 66.87742 66.31345
4 58.06906 60.35554 70.91920 65.76937 68.85464 68.77921 70.09373 66.03685 71.38322 66.42558 67.62292 70.25223
5 60.59774 64.66411 73.27853 73.11031 73.57735 73.77766 78.45258 74.64698 79.07911 75.84108 75.82478 79.42484
6 69.30907 70.80211 78.38037 77.63537 79.67396 80.15397 80.45571 76.67389 

There are some values missing in the last year of the time series. When transforming into a matrix, the function replaces the missing values by using the first ones from the time series.

> matrix(xx,nrow=6,ncol=12,byrow = TRUE)
         [,1]     [,2]     [,3]     [,4]     [,5]     [,6]     [,7]     [,8]     [,9]    [,10]    [,11]    [,12]
[1,] 62.17416 66.34748 69.58154 64.92033 70.51246 70.83153 74.19475 72.29366 77.97062 71.37903 67.08182 69.10599
[2,] 59.41197 61.44036 68.97711 72.43558 69.92715 74.20776 79.19150 71.48650 77.56661 67.44390 67.05218 66.43690
[3,] 58.79278 63.26696 70.97837 71.79423 72.04037 71.61184 72.64526 70.38479 72.43492 64.42046 66.87742 66.31345
[4,] 58.06906 60.35554 70.91920 65.76937 68.85464 68.77921 70.09373 66.03685 71.38322 66.42558 67.62292 70.25223
[5,] 60.59774 64.66411 73.27853 73.11031 73.57735 73.77766 78.45258 74.64698 79.07911 75.84108 75.82478 79.42484
[6,] 69.30907 70.80211 78.38037 77.63537 79.67396 80.15397 80.45571 76.67389 62.17416 66.34748 69.58154 64.92033

How to avoid this or to replace the the last values with "NA"?

Upvotes: 0

Views: 49

Answers (1)

G. Grothendieck
G. Grothendieck

Reputation: 269654

Suppose xx is the time series given in the Note at the end.

1) Then we can convert it to a matrix without knowing how many missing items there are as follows:

tt <- time(xx)
tapply(xx, list(floor(tt), cycle(tt)), c)

giving this 6 x 12 matrix:

            1        2        3        4        5        6        7        8
2011 62.17416 66.34748 69.58154 64.92033 70.51246 70.83153 74.19475 72.29366
2012 59.41197 61.44036 68.97711 72.43558 69.92715 74.20776 79.19150 71.48650
2013 58.79278 63.26696 70.97837 71.79423 72.04037 71.61184 72.64526 70.38479
2014 58.06906 60.35554 70.91920 65.76937 68.85464 68.77921 70.09373 66.03685
2015 60.59774 64.66411 73.27853 73.11031 73.57735 73.77766 78.45258 74.64698
2016 69.30907 70.80211 78.38037 77.63537 79.67396 80.15397 80.45571 76.67389
            9       10       11       12
2011 77.97062 71.37903 67.08182 69.10599
2012 77.56661 67.44390 67.05218 66.43690
2013 72.43492 64.42046 66.87742 66.31345
2014 71.38322 66.42558 67.62292 70.25223
2015 79.07911 75.84108 75.82478 79.42484
2016       NA       NA       NA       NA

2) Another possibility is to coerce the length to the next highest multiple of 12 so that it exactly fills a matrix and the reshape it. This only works if the NAs to be added are at the end, as is the case in the question.

m <- xx # copy xx in case we still need it later
freq <- frequency(xx)  # 12
length(m) <- freq * ceiling(length(m) / freq)  # pads with NAs to achieve length
m <- matrix(m,, freq, byrow = TRUE) # reshape m into a matrix

giving:

> m
         [,1]     [,2]     [,3]     [,4]     [,5]     [,6]     [,7]     [,8]
[1,] 62.17416 66.34748 69.58154 64.92033 70.51246 70.83153 74.19475 72.29366
[2,] 59.41197 61.44036 68.97711 72.43558 69.92715 74.20776 79.19150 71.48650
[3,] 58.79278 63.26696 70.97837 71.79423 72.04037 71.61184 72.64526 70.38479
[4,] 58.06906 60.35554 70.91920 65.76937 68.85464 68.77921 70.09373 66.03685
[5,] 60.59774 64.66411 73.27853 73.11031 73.57735 73.77766 78.45258 74.64698
[6,] 69.30907 70.80211 78.38037 77.63537 79.67396 80.15397 80.45571 76.67389
         [,9]    [,10]    [,11]    [,12]
[1,] 77.97062 71.37903 67.08182 69.10599
[2,] 77.56661 67.44390 67.05218 66.43690
[3,] 72.43492 64.42046 66.87742 66.31345
[4,] 71.38322 66.42558 67.62292 70.25223
[5,] 79.07911 75.84108 75.82478 79.42484
[6,]       NA       NA       NA       NA

Note: The input time series xx in reproducible form is:

xx <- ts(c(62.17416, 66.34748, 69.58154, 64.92033, 70.51246, 
70.83153, 74.19475, 72.29366, 77.97062, 71.37903, 67.08182, 69.10599, 
59.41197, 61.44036, 68.97711, 72.43558, 69.92715, 74.20776, 79.1915, 
71.4865, 77.56661, 67.4439, 67.05218, 66.4369, 58.79278, 63.26696, 
70.97837, 71.79423, 72.04037, 71.61184, 72.64526, 70.38479, 72.43492, 
64.42046, 66.87742, 66.31345, 58.06906, 60.35554, 70.9192, 65.76937, 
68.85464, 68.77921, 70.09373, 66.03685, 71.38322, 66.42558, 67.62292, 
70.25223, 60.59774, 64.66411, 73.27853, 73.11031, 73.57735, 73.77766, 
78.45258, 74.64698, 79.07911, 75.84108, 75.82478, 79.42484, 69.30907, 
70.80211, 78.38037, 77.63537, 79.67396, 80.15397, 80.45571, 76.67389
), start = c(2011, 1), freq = 12)

Upvotes: 1

Related Questions