Tony
Tony

Reputation: 83

Finding the last date of each month in a data frame

I have a problem I have been working. I have a dataset of dates (in a data frame format) that I need to search through and find the last day of each month and put them into a new data frame. There is also a value in the next column that goes with it. Heres a sample of the dates.

   [1] "2015-05-21" "2015-05-20" "2015-05-19" "2015-05-18" "2015-05-15" "2015-05-14"
   [7] "2015-05-13" "2015-05-12" "2015-05-11" "2015-05-08" "2015-05-07" "2015-05-06"
  [13] "2015-05-05" "2015-05-04" "2015-05-01" "2015-04-30" "2015-04-29" "2015-04-28"
  [19] "2015-04-27" "2015-04-24" "2015-04-23" "2015-04-22" "2015-04-21" "2015-04-20"
  [25] "2015-04-17" "2015-04-16" "2015-04-15" "2015-04-14" "2015-04-13" "2015-04-10"
  [31] "2015-04-09" "2015-04-08" "2015-04-07" "2015-04-06" "2015-04-02" "2015-04-01"
  [37] "2015-03-31" "2015-03-30" "2015-03-27" "2015-03-26" "2015-03-25" "2015-03-24"
  [43] "2015-03-23" "2015-03-20" "2015-03-19" "2015-03-18" "2015-03-17" "2015-03-16"
  [49] "2015-03-13" "2015-03-12" "2015-03-11" "2015-03-10" "2015-03-09" "2015-03-06"
  [55] "2015-03-05" "2015-03-04" "2015-03-03" "2015-03-02" "2015-02-27" "2015-02-26"
  [61] "2015-02-25" "2015-02-24" "2015-02-23" "2015-02-20" "2015-02-19" "2015-02-18"
  [67] "2015-02-17" "2015-02-13" "2015-02-12" "2015-02-11" "2015-02-10" "2015-02-09"
  [73] "2015-02-06" "2015-02-05" "2015-02-04" "2015-02-03" "2015-02-02" "2015-01-30"
  [79] "2015-01-29" "2015-01-28" "2015-01-27" "2015-01-26" "2015-01-23" "2015-01-22"
  [85] "2015-01-21" "2015-01-20" "2015-01-16" "2015-01-15" "2015-01-14" "2015-01-13"
  [91] "2015-01-12" "2015-01-09" "2015-01-08" "2015-01-07" "2015-01-06" "2015-01-05"
  [97] "2015-01-02" "2014-12-31" "2014-12-30" "2014-12-29" "2014-12-26" "2014-12-24"
 [103] "2014-12-23" "2014-12-22" "2014-12-19" "2014-12-18" "2014-12-17" "2014-12-16"
 [109] "2014-12-15" "2014-12-12" "2014-12-11" "2014-12-10" "2014-12-09" "2014-12-08"
 [115] "2014-12-05" "2014-12-04" "2014-12-03" "2014-12-02" "2014-12-01" "2014-11-28"
 [121] "2014-11-26" "2014-11-25" "2014-11-24" "2014-11-21" "2014-11-20" "2014-11-19"
 [127] "2014-11-18" "2014-11-17" "2014-11-14" "2014-11-13" "2014-11-12" "2014-11-11"
 [133] "2014-11-10" "2014-11-07" "2014-11-06" "2014-11-05" "2014-11-04" "2014-11-03"
 [139] "2014-10-31" "2014-10-30" "2014-10-29" "2014-10-28" "2014-10-27" "2014-10-24"
 [145] "2014-10-23" "2014-10-22" "2014-10-21" "2014-10-20" "2014-10-17" "2014-10-16"
 [151] "2014-10-15" "2014-10-14" "2014-10-13" "2014-10-10" "2014-10-09" "2014-10-08"
 [157] "2014-10-07" "2014-10-06" "2014-10-03" "2014-10-02" "2014-10-01" "2014-09-30"
 [163] "2014-09-29" "2014-09-26" "2014-09-25" "2014-09-24" "2014-09-23" "2014-09-22"
 [169] "2014-09-19" "2014-09-18" "2014-09-17" "2014-09-16" "2014-09-15" "2014-09-12"

This is a small portion. There are 5700 rows...

Upvotes: 1

Views: 6823

Answers (3)

hyunwoo jeong
hyunwoo jeong

Reputation: 1634

if you have a data frame,'a' including date variable,'date',

library(dplyr)
a<-mutate(a,endmonth=as.Date(paste0(as.numeric(format(date,"%Y%m"))+1,"01"),"%Y%m%d")-1)     

for example :

a<-data.frame(date=as.Date("2016-02-15","2016-04-05"))
a1<-mutate(a,endmonth=
             as.Date(paste0(as.numeric(format(date,"%Y%m"))+1,"01"),"%Y%m%d")-1)
>a1
        date   endmonth
1 2016-02-15 2016-02-29
2 2016-04-05 2016-04-30

Upvotes: 0

G. Grothendieck
G. Grothendieck

Reputation: 270248

1) Try tapply like this:

dates <- c("2015-05-13", "2015-05-12", "2015-05-11", "2015-04-27", 
      "2015-04-24", "2015-04-23")

tapply(dates, substr(dates, 1, 7), max)

or this:

library(zoo)
tapply(dates, as.yearmon(dates), max)

2) If the dates are in a data frame:

 DF <- data.frame(dates, stringsAsFactors = FALSE)

 aggregate(DF["dates"], list(month = substr(DF$dates, 1, 7)), max)

or replace substr(...) with as.yearmon(DF$dates).

3) This picks out the last row of each month. In this case ave returns a character variable so need to use as.logical to convert it to logical:

isMax <- function(x) seq_along(x) == which.max(as.Date(x))
subset(DF, as.logical(ave(dates, substr(dates, 1, 7), FUN = isMax)))

or use as.yearmon(dates) in place of substr(...). The following definition of isMax could be substituted

isMax <- function(x) seq_along(x) == which.max(as.Date(x))

and gives the same result except when there are multiple maxima. In that case it gives the first only whereas the earlier definition gives them all. As which.max does not work with "character" class we converted dates to "Date" class.

Next time please use dput to display your data or otherwise provide it in reproducible form as we did above. Also cut it down to the minimum amount needed to illustrate the problem.

Upvotes: 10

Molx
Molx

Reputation: 6931

Here's an alternative solution using dplyr to group by month and year and then filter the max i.e. last date of each yearmonth:

df <- data.frame(dates=as.Date(c("2015-05-21", "2015-05-20", "2015-05-19", "2015-05-18",
                 "2015-05-15", "2015-05-14","2014-12-05", "2014-12-04", "2014-12-03",
                 "2014-12-02", "2014-12-01", "2014-11-28")))

library(dplyr)

df %>% 
  group_by(strftime(dates, "%Y-%m")) %>% #Groups by the yearmonths
  filter(dates == max(dates)) %>%        #Take the last date of each group
  .$dates                                #Returns the filtered dates as a vector

Resulting in:

[1] "2015-05-21" "2014-12-05" "2014-11-28"

Thanks to @akrun for tips on fixing and improving the code.

Upvotes: 5

Related Questions