Eugenio
Eugenio

Reputation: 61

R how to extract end of month rows from a timeserie

I've a data frame like this

X.1   X       date adjClose     lagx    pct_change     rollmeanx     rollsdx   roll_sharpe  
1     1   1 2019-01-02 119.1437       NA            NA            NA          NA            NA  
2     2   2 2019-01-03 120.4995 119.1437  0.0112514165            NA          NA            NA  
3     3   3 2019-01-04 119.1047 120.4995 -0.0117107526            NA          NA            NA  
4     4   4 2019-01-07 118.7535 119.1047 -0.0029568789            NA          NA            NA  
5     5   5 2019-01-08 118.4414 118.7535 -0.0026352631            NA          NA            NA  
6     6   6 2019-01-09 118.2561 118.4414 -0.0015671396            NA          NA            NA  
7     7   7 2019-01-10 117.4953 118.2561 -0.0064751785            NA          NA            NA  
8     8   8 2019-01-11 117.9537 117.4953  0.0038865459            NA          NA            NA  
9     9   9 2019-01-14 117.5148 117.9537 -0.0037350598            NA          NA            NA  
10   10  10 2019-01-15 117.0856 117.5148 -0.0036654449            NA          NA            NA  
11   11  11 2019-01-16 117.2026 117.0856  0.0009986684            NA          NA            NA  
12   12  12 2019-01-17 117.2319 117.2026  0.0002496048            NA          NA            NA  
13   13  13 2019-01-18 116.6174 117.2319 -0.0052693208            NA          NA            NA  
14   14  14 2019-01-22 117.4075 116.6174  0.0067292515            NA          NA            NA  
15   15  15 2019-01-23 117.3587 117.4075 -0.0004155585            NA          NA            NA  
16   16  16 2019-01-24 118.1293 117.3587  0.0065229956            NA          NA            NA  
17   17  17 2019-01-25 117.5635 118.1293 -0.0048120800            NA          NA            NA  
18   18  18 2019-01-28 117.4465 117.5635 -0.0009965950            NA          NA            NA  
19   19  19 2019-01-29 118.0415 117.4465  0.0050404892            NA          NA            NA  
20   20  20 2019-01-30 117.9537 118.0415 -0.0007442322            NA          NA            NA  
21   21  21 2019-01-31 118.9681 117.9537  0.0085266869            NA          NA            NA  
22   22  22 2019-02-01 118.2510 118.9681 -0.0060639085            NA          NA            NA  
23   23  23 2019-02-04 117.7231 118.2510 -0.0044843049            NA          NA            NA  
24   24  24 2019-02-05 118.2608 117.7231  0.0045465818            NA          NA            NA  
25   25  25 2019-02-06 118.3195 118.2608  0.0004957449            NA          NA            NA  
26   26  26 2019-02-07 119.1016 118.3195  0.0065665271            NA          NA            NA  
27   27  27 2019-02-08 119.6099 119.1016  0.0042501022  1.358807e-04 0.005456898  1.358807e-04  
28   28  28 2019-02-11 119.1407 119.6099 -0.0039386231 -4.483516e-04 0.005014440 -4.483516e-04  
29   29  29 2019-02-12 118.8376 119.1407 -0.0025501810 -9.602194e-05 0.004485371 -9.602194e-05  
30   30  30 2019-02-13 118.3879 118.8376 -0.0037985136 -1.283925e-04 0.004509813 -1.283925e-04  
31   31  31 2019-02-14 119.0624 118.3879  0.0056654898  1.908672e-04 0.004617770  1.908672e-04  
32   32  32 2019-02-15 119.2482 119.0624  0.0015576324  3.110508e-04 0.004610843  3.110508e-04  
33   33  33 2019-02-19 119.5512 119.2482  0.0025349579  6.575945e-04 0.004414827  6.575945e-04  
34   34  34 2019-02-20 119.2091 119.5512 -0.0028702641  3.977172e-04 0.004416021  3.977172e-04  
35   35  35 2019-02-21 118.1435 119.2091 -0.0090194456  1.944715e-04 0.004724659  1.944715e-04  
36   36  36 2019-02-22 118.8474 118.1435  0.0059225138  5.632392e-04 0.004785127  5.632392e-04  
37   37  37 2019-02-25 118.4954 118.8474 -0.0029700520  4.105961e-04 0.004833734  4.105961e-04  
38   38  38 2019-02-26 119.0820 118.4954  0.0049257040  5.904461e-04 0.004913833  5.904461e-04  
39   39  39 2019-02-27 117.7329 119.0820 -0.0114589388  3.523838e-04 0.005340492  3.523838e-04  
40   40  40 2019-02-28 117.3321 117.7329 -0.0034160973 -3.782189e-05 0.005225321 -3.782189e-05  

what I need is a dataframe with only the end of month rows.

What I managed to do till now is this:

'''

library(tidyverse)  
library(roll)  
library (xts)  
library(PerformanceAnalytics)  
library(zoo)  
library(tseries)  
tlt_data <- read.csv('tlt_r.csv')  
d <- data.frame(date = as.Date(tlt_data$date, origin = '1970-01-01'))  
d$day <- format(d$date, format = '%d')  
d$my <- format(d$date, format = '%Y-%m')  
fds <- with(d, tapply(day, my, max))  
fds <- as.Date(paste(row.names(fds), fds, sep = '-'))  
xdf_endofmonth <- tlt_data[tlt_data$date %in% fds, 'date', 'roll_sharpe'] 

'''

Tnx for your hints.

Upvotes: 2

Views: 734

Answers (1)

dc37
dc37

Reputation: 16178

There is probably much better and elegant way of doing this, but here a possible solution.

Starting from the date format of your column dates, you can separate it in month and year by using month and year function from lubridate package.

Then, to each month and year, you add 01 in order to create the first date of each month. You can then group by this date and select the maximal date of each group which correspond to the last date of each month.

library(lubridate)
df %>% mutate(Month = month(date), Year = year(date)) %>%
  mutate(Ini_date = ymd(paste(Year, Month, "01", sep = "-"))) %>% 
  group_by(Ini_date) %>% 
  filter(date == max(date))

# A tibble: 61 x 6
# Groups:   Ini_date [61]
   date           t   Value Month  Year Ini_date  
   <date>     <int>   <dbl> <dbl> <dbl> <date>    
 1 2015-03-31    16 -0.0729     3  2015 2015-03-01
 2 2015-04-30    46  2.09       4  2015 2015-04-01
 3 2015-05-31    77 -0.611      5  2015 2015-05-01
 4 2015-06-30   107 -1.36       6  2015 2015-06-01
 5 2015-07-31   138  1.44       7  2015 2015-07-01
 6 2015-08-31   169 -1.09       8  2015 2015-08-01
 7 2015-09-30   199 -0.152      9  2015 2015-09-01
 8 2015-10-31   230  1.04      10  2015 2015-10-01
 9 2015-11-30   260 -0.494     11  2015 2015-11-01
10 2015-12-31   291 -0.703     12  2015 2015-12-01
# … with 51 more rows

Does it answer your question ?


Reproducible example

date <- seq(ymd("2015-03-16"), ymd("2020-03-13"), by = "day")
df <- data.frame(date = date,
                 t = 1:1825,
                 Value = rnorm(1825))

Upvotes: 4

Related Questions