Reputation: 83
I have a problem I have been working. I have a dataset of dates (in a data frame format) that I need to search through and find the last day of each month and put them into a new data frame. There is also a value in the next column that goes with it. Heres a sample of the dates.
[1] "2015-05-21" "2015-05-20" "2015-05-19" "2015-05-18" "2015-05-15" "2015-05-14"
[7] "2015-05-13" "2015-05-12" "2015-05-11" "2015-05-08" "2015-05-07" "2015-05-06"
[13] "2015-05-05" "2015-05-04" "2015-05-01" "2015-04-30" "2015-04-29" "2015-04-28"
[19] "2015-04-27" "2015-04-24" "2015-04-23" "2015-04-22" "2015-04-21" "2015-04-20"
[25] "2015-04-17" "2015-04-16" "2015-04-15" "2015-04-14" "2015-04-13" "2015-04-10"
[31] "2015-04-09" "2015-04-08" "2015-04-07" "2015-04-06" "2015-04-02" "2015-04-01"
[37] "2015-03-31" "2015-03-30" "2015-03-27" "2015-03-26" "2015-03-25" "2015-03-24"
[43] "2015-03-23" "2015-03-20" "2015-03-19" "2015-03-18" "2015-03-17" "2015-03-16"
[49] "2015-03-13" "2015-03-12" "2015-03-11" "2015-03-10" "2015-03-09" "2015-03-06"
[55] "2015-03-05" "2015-03-04" "2015-03-03" "2015-03-02" "2015-02-27" "2015-02-26"
[61] "2015-02-25" "2015-02-24" "2015-02-23" "2015-02-20" "2015-02-19" "2015-02-18"
[67] "2015-02-17" "2015-02-13" "2015-02-12" "2015-02-11" "2015-02-10" "2015-02-09"
[73] "2015-02-06" "2015-02-05" "2015-02-04" "2015-02-03" "2015-02-02" "2015-01-30"
[79] "2015-01-29" "2015-01-28" "2015-01-27" "2015-01-26" "2015-01-23" "2015-01-22"
[85] "2015-01-21" "2015-01-20" "2015-01-16" "2015-01-15" "2015-01-14" "2015-01-13"
[91] "2015-01-12" "2015-01-09" "2015-01-08" "2015-01-07" "2015-01-06" "2015-01-05"
[97] "2015-01-02" "2014-12-31" "2014-12-30" "2014-12-29" "2014-12-26" "2014-12-24"
[103] "2014-12-23" "2014-12-22" "2014-12-19" "2014-12-18" "2014-12-17" "2014-12-16"
[109] "2014-12-15" "2014-12-12" "2014-12-11" "2014-12-10" "2014-12-09" "2014-12-08"
[115] "2014-12-05" "2014-12-04" "2014-12-03" "2014-12-02" "2014-12-01" "2014-11-28"
[121] "2014-11-26" "2014-11-25" "2014-11-24" "2014-11-21" "2014-11-20" "2014-11-19"
[127] "2014-11-18" "2014-11-17" "2014-11-14" "2014-11-13" "2014-11-12" "2014-11-11"
[133] "2014-11-10" "2014-11-07" "2014-11-06" "2014-11-05" "2014-11-04" "2014-11-03"
[139] "2014-10-31" "2014-10-30" "2014-10-29" "2014-10-28" "2014-10-27" "2014-10-24"
[145] "2014-10-23" "2014-10-22" "2014-10-21" "2014-10-20" "2014-10-17" "2014-10-16"
[151] "2014-10-15" "2014-10-14" "2014-10-13" "2014-10-10" "2014-10-09" "2014-10-08"
[157] "2014-10-07" "2014-10-06" "2014-10-03" "2014-10-02" "2014-10-01" "2014-09-30"
[163] "2014-09-29" "2014-09-26" "2014-09-25" "2014-09-24" "2014-09-23" "2014-09-22"
[169] "2014-09-19" "2014-09-18" "2014-09-17" "2014-09-16" "2014-09-15" "2014-09-12"
This is a small portion. There are 5700 rows...
Upvotes: 1
Views: 6823
Reputation: 1634
if you have a data frame,'a' including date variable,'date',
library(dplyr)
a<-mutate(a,endmonth=as.Date(paste0(as.numeric(format(date,"%Y%m"))+1,"01"),"%Y%m%d")-1)
for example :
a<-data.frame(date=as.Date("2016-02-15","2016-04-05"))
a1<-mutate(a,endmonth=
as.Date(paste0(as.numeric(format(date,"%Y%m"))+1,"01"),"%Y%m%d")-1)
>a1
date endmonth
1 2016-02-15 2016-02-29
2 2016-04-05 2016-04-30
Upvotes: 0
Reputation: 270248
1) Try tapply
like this:
dates <- c("2015-05-13", "2015-05-12", "2015-05-11", "2015-04-27",
"2015-04-24", "2015-04-23")
tapply(dates, substr(dates, 1, 7), max)
or this:
library(zoo)
tapply(dates, as.yearmon(dates), max)
2) If the dates are in a data frame:
DF <- data.frame(dates, stringsAsFactors = FALSE)
aggregate(DF["dates"], list(month = substr(DF$dates, 1, 7)), max)
or replace substr(...)
with as.yearmon(DF$dates)
.
3) This picks out the last row of each month. In this case ave
returns a character variable so need to use as.logical
to convert it to logical:
isMax <- function(x) seq_along(x) == which.max(as.Date(x))
subset(DF, as.logical(ave(dates, substr(dates, 1, 7), FUN = isMax)))
or use as.yearmon(dates)
in place of substr(...)
. The following definition of isMax
could be substituted
isMax <- function(x) seq_along(x) == which.max(as.Date(x))
and gives the same result except when there are multiple maxima. In that case it gives the first only whereas the earlier definition gives them all. As which.max
does not work with "character"
class we converted dates to "Date"
class.
Next time please use dput
to display your data or otherwise provide it in reproducible form as we did above. Also cut it down to the minimum amount needed to illustrate the problem.
Upvotes: 10
Reputation: 6931
Here's an alternative solution using dplyr
to group by month and year and then filter the max
i.e. last date of each yearmonth:
df <- data.frame(dates=as.Date(c("2015-05-21", "2015-05-20", "2015-05-19", "2015-05-18",
"2015-05-15", "2015-05-14","2014-12-05", "2014-12-04", "2014-12-03",
"2014-12-02", "2014-12-01", "2014-11-28")))
library(dplyr)
df %>%
group_by(strftime(dates, "%Y-%m")) %>% #Groups by the yearmonths
filter(dates == max(dates)) %>% #Take the last date of each group
.$dates #Returns the filtered dates as a vector
Resulting in:
[1] "2015-05-21" "2014-12-05" "2014-11-28"
Thanks to @akrun for tips on fixing and improving the code.
Upvotes: 5