Reputation: 61
I've a data frame like this
X.1 X date adjClose lagx pct_change rollmeanx rollsdx roll_sharpe
1 1 1 2019-01-02 119.1437 NA NA NA NA NA
2 2 2 2019-01-03 120.4995 119.1437 0.0112514165 NA NA NA
3 3 3 2019-01-04 119.1047 120.4995 -0.0117107526 NA NA NA
4 4 4 2019-01-07 118.7535 119.1047 -0.0029568789 NA NA NA
5 5 5 2019-01-08 118.4414 118.7535 -0.0026352631 NA NA NA
6 6 6 2019-01-09 118.2561 118.4414 -0.0015671396 NA NA NA
7 7 7 2019-01-10 117.4953 118.2561 -0.0064751785 NA NA NA
8 8 8 2019-01-11 117.9537 117.4953 0.0038865459 NA NA NA
9 9 9 2019-01-14 117.5148 117.9537 -0.0037350598 NA NA NA
10 10 10 2019-01-15 117.0856 117.5148 -0.0036654449 NA NA NA
11 11 11 2019-01-16 117.2026 117.0856 0.0009986684 NA NA NA
12 12 12 2019-01-17 117.2319 117.2026 0.0002496048 NA NA NA
13 13 13 2019-01-18 116.6174 117.2319 -0.0052693208 NA NA NA
14 14 14 2019-01-22 117.4075 116.6174 0.0067292515 NA NA NA
15 15 15 2019-01-23 117.3587 117.4075 -0.0004155585 NA NA NA
16 16 16 2019-01-24 118.1293 117.3587 0.0065229956 NA NA NA
17 17 17 2019-01-25 117.5635 118.1293 -0.0048120800 NA NA NA
18 18 18 2019-01-28 117.4465 117.5635 -0.0009965950 NA NA NA
19 19 19 2019-01-29 118.0415 117.4465 0.0050404892 NA NA NA
20 20 20 2019-01-30 117.9537 118.0415 -0.0007442322 NA NA NA
21 21 21 2019-01-31 118.9681 117.9537 0.0085266869 NA NA NA
22 22 22 2019-02-01 118.2510 118.9681 -0.0060639085 NA NA NA
23 23 23 2019-02-04 117.7231 118.2510 -0.0044843049 NA NA NA
24 24 24 2019-02-05 118.2608 117.7231 0.0045465818 NA NA NA
25 25 25 2019-02-06 118.3195 118.2608 0.0004957449 NA NA NA
26 26 26 2019-02-07 119.1016 118.3195 0.0065665271 NA NA NA
27 27 27 2019-02-08 119.6099 119.1016 0.0042501022 1.358807e-04 0.005456898 1.358807e-04
28 28 28 2019-02-11 119.1407 119.6099 -0.0039386231 -4.483516e-04 0.005014440 -4.483516e-04
29 29 29 2019-02-12 118.8376 119.1407 -0.0025501810 -9.602194e-05 0.004485371 -9.602194e-05
30 30 30 2019-02-13 118.3879 118.8376 -0.0037985136 -1.283925e-04 0.004509813 -1.283925e-04
31 31 31 2019-02-14 119.0624 118.3879 0.0056654898 1.908672e-04 0.004617770 1.908672e-04
32 32 32 2019-02-15 119.2482 119.0624 0.0015576324 3.110508e-04 0.004610843 3.110508e-04
33 33 33 2019-02-19 119.5512 119.2482 0.0025349579 6.575945e-04 0.004414827 6.575945e-04
34 34 34 2019-02-20 119.2091 119.5512 -0.0028702641 3.977172e-04 0.004416021 3.977172e-04
35 35 35 2019-02-21 118.1435 119.2091 -0.0090194456 1.944715e-04 0.004724659 1.944715e-04
36 36 36 2019-02-22 118.8474 118.1435 0.0059225138 5.632392e-04 0.004785127 5.632392e-04
37 37 37 2019-02-25 118.4954 118.8474 -0.0029700520 4.105961e-04 0.004833734 4.105961e-04
38 38 38 2019-02-26 119.0820 118.4954 0.0049257040 5.904461e-04 0.004913833 5.904461e-04
39 39 39 2019-02-27 117.7329 119.0820 -0.0114589388 3.523838e-04 0.005340492 3.523838e-04
40 40 40 2019-02-28 117.3321 117.7329 -0.0034160973 -3.782189e-05 0.005225321 -3.782189e-05
what I need is a dataframe with only the end of month rows.
What I managed to do till now is this:
'''
library(tidyverse)
library(roll)
library (xts)
library(PerformanceAnalytics)
library(zoo)
library(tseries)
tlt_data <- read.csv('tlt_r.csv')
d <- data.frame(date = as.Date(tlt_data$date, origin = '1970-01-01'))
d$day <- format(d$date, format = '%d')
d$my <- format(d$date, format = '%Y-%m')
fds <- with(d, tapply(day, my, max))
fds <- as.Date(paste(row.names(fds), fds, sep = '-'))
xdf_endofmonth <- tlt_data[tlt_data$date %in% fds, 'date', 'roll_sharpe']
'''
Tnx for your hints.
Upvotes: 2
Views: 734
Reputation: 16178
There is probably much better and elegant way of doing this, but here a possible solution.
Starting from the date format of your column dates, you can separate it in month and year by using month
and year
function from lubridate
package.
Then, to each month and year, you add 01 in order to create the first date of each month. You can then group by this date and select the maximal date of each group which correspond to the last date of each month.
library(lubridate)
df %>% mutate(Month = month(date), Year = year(date)) %>%
mutate(Ini_date = ymd(paste(Year, Month, "01", sep = "-"))) %>%
group_by(Ini_date) %>%
filter(date == max(date))
# A tibble: 61 x 6
# Groups: Ini_date [61]
date t Value Month Year Ini_date
<date> <int> <dbl> <dbl> <dbl> <date>
1 2015-03-31 16 -0.0729 3 2015 2015-03-01
2 2015-04-30 46 2.09 4 2015 2015-04-01
3 2015-05-31 77 -0.611 5 2015 2015-05-01
4 2015-06-30 107 -1.36 6 2015 2015-06-01
5 2015-07-31 138 1.44 7 2015 2015-07-01
6 2015-08-31 169 -1.09 8 2015 2015-08-01
7 2015-09-30 199 -0.152 9 2015 2015-09-01
8 2015-10-31 230 1.04 10 2015 2015-10-01
9 2015-11-30 260 -0.494 11 2015 2015-11-01
10 2015-12-31 291 -0.703 12 2015 2015-12-01
# … with 51 more rows
Does it answer your question ?
Reproducible example
date <- seq(ymd("2015-03-16"), ymd("2020-03-13"), by = "day")
df <- data.frame(date = date,
t = 1:1825,
Value = rnorm(1825))
Upvotes: 4