Reputation:

Subset specific dates (year and month) from data.frame

Here my data.frame:

df = read.table(text = 'ID   Date
1      1975-01-01
2      1980-02-01
3      1985-05-01
4      1990-07-01
5      1990-08-01
6      1993-01-01
7      1993-09-01', header = TRUE)

I need to create a subset from it by selecting specific dates (year and month).

The dates I am interested are:

dates = c('1980-02', '1990-07', '1993-09')

and therefore my output should be:

ID     Date
2      1980-02-01
4      1990-07-01
7      1993-09-01

Is there any possibility to write a unique code that do the trick starting from df and dates, without add manually the dates in the code?

Thanks

####UPDATE

What if I have got multiple observations in the same month within df, like:

df2 = read.table(text = 'ID   Date
1      1975-01-01
2      1980-02-01
9      1980-02-01
3      1985-05-01
4      1990-07-01
12     1990-07-01
16     1990-07-01
5      1990-08-01
6      1993-01-01
7      1993-09-01
67     1993-09-01', header = TRUE)

New output:

ID     Date
2      1980-02-01
9      1980-02-01
4      1990-07-01
12     1990-07-01
16     1990-07-01
7      1993-09-01
67     1993-09-01

Thanks

Upvotes: 0

Answers (5)

CoderGuy123

Reputation: 6649

Your dates are formatted as strings, so they are somewhat difficult to work with. In general, it's better to format them as actual date objects. This can be done using the lubridate package. Here's what I would do. I use readr package for automatic type detection, and purrr for functional programming.

library(pacman)
p_load(lubridate, readr, purrr)

df = read_table(
'ID   Date
1      1975-01-01
2      1980-02-01
9      1980-02-01
3      1985-05-01
4      1990-07-01
12     1990-07-01
16     1990-07-01
5      1990-08-01
6      1993-01-01
7      1993-09-01
67     1993-09-01'
)

dates = parse_date_time(c('1980-02', '1990-07', '1993-09'), orders = "Y-m")

#subset
df[year(df$Date) %in% year(dates) & month(df$Date) %in% month(dates), ]

The output of which is:

# A tibble: 7 × 2
     ID       Date
  <int>     <date>
1     2 1980-02-01
2     9 1980-02-01
3     4 1990-07-01
4    12 1990-07-01
5    16 1990-07-01
6     7 1993-09-01
7    67 1993-09-01

So, we load the data as you did, but do it with readr to have to automatically recognize the dates. Then we subset the rows where the year matches any of the years in the dates object, and the month matches any of the months in the dates object. This gives the output you wanted.

However, perhaps you only want the combinations you gave. So e.g. year 1980 is only okay if it occurs with month 2. If so, it's a bit more complicated. This could be done in many ways, but I settled on a functional approach. This is not the fastest to execute, but it is quick to write and very flexible.

#subset stricter
inclusion_func = function(x, desired_dates) {
  #loop over each date
  map_lgl(x, function(date) {
    any(map_lgl(desired_dates, function(desired_date) {
      year(date) == year(desired_date) && month(date) == month(desired_date)
    })
    )
  })
}

df[inclusion_func(df$Date, dates), ]

The output of which is the same:

# A tibble: 7 × 2
     ID       Date
  <int>     <date>
1     2 1980-02-01
2     9 1980-02-01
3     4 1990-07-01
4    12 1990-07-01
5    16 1990-07-01
6     7 1993-09-01
7    67 1993-09-01

What the function does is loop over each date in the data frame, and loop over each year/month combination. It then checks if both the year and month matches for that particular combination. If any of the three combinations matches (hence the any), it returns TRUE for that row.

Upvotes: 2

G. Grothendieck

Reputation: 269461

Here are some solutions. They (i) work with any dates, not just ones that are the first of the month, (ii) preserve the order of df2 in the output, (iii) are compact, i.e. one line each and do not require mentioning df2 multiple times.

1) substr This uses no packages.

subset(df2, substr(Date, 1, 7) %in% dates)

giving:

   ID       Date
2   2 1980-02-01
3   9 1980-02-01
5   4 1990-07-01
6  12 1990-07-01
7  16 1990-07-01
10  7 1993-09-01
11 67 1993-09-01

2) zoo::as.yearmon Another possibility is to convert both Date and dates to "yearmon" class giving the same result. This code is a bit nicer but does need a package.

library(zoo)
subset(df2, as.yearmon(Date) %in% as.yearmon(dates))

Upvotes: 1

Ronak Shah

Reputation: 388817

According to the data you have shared the class of Date column is factor. We convert them into Date class and extract the month and year part and match it the dates vector to get the row numbers from df that match.

df[match(dates, format(as.Date(df$Date), "%Y-%m")), ]  

#   ID       Date
#2  2 1980-02-01
#4  4 1990-07-01
#7  7 1993-09-01

As per the updated question, if we have multiple dates to match we can use %in% this gives you the desired output.

df2[format(as.Date(df2$Date), "%Y-%m") %in% dates, ]

#   ID       Date
#2   2 1980-02-01
#3   9 1980-02-01
#5   4 1990-07-01
#6  12 1990-07-01
#7  16 1990-07-01
#10  7 1993-09-01
#11 67 1993-09-01

Upvotes: 0

user7109363

Reputation:

as @eipi10 pointed out in the comments:

df[df$Date %in% as.Date(paste0(dates,"-01")), ]

this works good with my updated question.

Thanks

Upvotes: 0

G5W

Reputation: 37641

Try

    S = sapply(dates, function(d) { grep(d, df[,2]) })
    df[S,]

Upvotes: 0

Subset specific dates (year and month) from data.frame

Answers (5)

Related Questions