user12221453
user12221453

Reputation: 21

Create matrix from dataset in R

I want to create a matrix from my data. My data consists of two columns, date and my observations for each date. I want the matrix to have year as rows and days as columns, e.g. :

      17   18   19   20   ...   31
1904  x11  x12  ...
1905
1906
.
.
.
2019

The days in this case is for December each year. I would like missing values to equal NA.

Here's a sample of my data:

> head(cdata)
# A tibble: 6 x 2
  Datum               Snödjup
  <dttm>                <dbl>
1 1904-12-01 00:00:00    0.02
2 1904-12-02 00:00:00    0.02
3 1904-12-03 00:00:00    0.01
4 1904-12-04 00:00:00    0.01
5 1904-12-12 00:00:00    0.02
6 1904-12-13 00:00:00    0.02

I figured that the first thing I need to do is to split the date into year, month and day (European formatting, YYYY-MM-DD) so I did that and got rid of the date column (the one that says Datum) and also got rid of the unrelevant days, namely the ones < 17.

cdata %>%
  dplyr::mutate(year = lubridate::year(Datum), 
                month = lubridate::month(Datum), 
                day = lubridate::day(Datum))
select(cd, -c(Datum))

cu <- cd[which(cd$day > 16
                         & cd$day < 32
                                    & cd$month == 12),]

and now it looks like this:

> cu
# A tibble: 1,284 x 4
   Snödjup  year month   day
     <dbl> <dbl> <dbl> <int>
 1    0.01  1904    12    26
 2    0.01  1904    12    27
 3    0.01  1904    12    28
 4    0.12  1904    12    29
 5    0.12  1904    12    30
 6    0.15  1904    12    31
 7    0.07  1906    12    17
 8    0.05  1906    12    18
 9    0.05  1906    12    19
10    0.04  1906    12    20
# … with 1,274 more rows

Now I need to fit my data into a matrix with missing values as NA. Is there anyway to do this?

Upvotes: 2

Views: 837

Answers (2)

Ronak Shah
Ronak Shah

Reputation: 388862

You can try :

library(dplyr)
library(tidyr)

cdata %>%
   mutate(year = lubridate::year(Datum), 
          day = lubridate::day(Datum)) %>%
   filter(day >= 17) %>%
   complete(day = 17:31) %>%
   select(year, day, Snödjup) %>%
   pivot_wider(names_from = day, values_from = Snödjup)

Upvotes: 1

jay.sf
jay.sf

Reputation: 72673

Base R approach, using by.

r <- `colnames<-`(do.call(rbind, by(dat, substr(dat$date, 1, 4), function(x) x[2])), 1:31)
r[,17:31]
#         17    18    19   20    21    22    23   24    25    26    27    28   29    30   31
# 1904 -0.28 -2.66 -2.44 1.32 -0.31 -1.78 -0.17 1.21  1.90 -0.43 -0.26 -1.76 0.46 -0.64 0.46
# 1905  1.44 -0.43  0.66 0.32 -0.78  1.58  0.64 0.09  0.28  0.68  0.09 -2.99 0.28 -0.37 0.19
# 1906 -0.89 -1.10  1.51 0.26  0.09 -0.12 -1.19 0.61 -0.22 -0.18  0.93  0.82 1.39 -0.48 0.65

Toy data

set.seed(42)
dat <- do.call(rbind, lapply(1904:1906, function(x) 
  data.frame(date=seq(ISOdate(x, 12, 1, 0), ISOdate(x, 12, 31, 0), "day" ),
             value=round(rnorm(31), 2))))

Upvotes: 1

Related Questions