Reputation:
Here my data.frame:
df = read.table(text = 'ID Date
1 1975-01-01
2 1980-02-01
3 1985-05-01
4 1990-07-01
5 1990-08-01
6 1993-01-01
7 1993-09-01', header = TRUE)
I need to create a subset from it by selecting specific dates (year and month).
The dates I am interested are:
dates = c('1980-02', '1990-07', '1993-09')
and therefore my output should be:
ID Date
2 1980-02-01
4 1990-07-01
7 1993-09-01
Is there any possibility to write a unique code that do the trick starting from df
and dates
, without add manually the dates in the code?
Thanks
####UPDATEWhat if I have got multiple observations in the same month within df
, like:
df2 = read.table(text = 'ID Date
1 1975-01-01
2 1980-02-01
9 1980-02-01
3 1985-05-01
4 1990-07-01
12 1990-07-01
16 1990-07-01
5 1990-08-01
6 1993-01-01
7 1993-09-01
67 1993-09-01', header = TRUE)
New output:
ID Date
2 1980-02-01
9 1980-02-01
4 1990-07-01
12 1990-07-01
16 1990-07-01
7 1993-09-01
67 1993-09-01
Thanks
Upvotes: 0
Views: 2334
Reputation: 6649
Your dates are formatted as strings, so they are somewhat difficult to work with. In general, it's better to format them as actual date objects. This can be done using the lubridate package. Here's what I would do. I use readr package for automatic type detection, and purrr for functional programming.
library(pacman)
p_load(lubridate, readr, purrr)
df = read_table(
'ID Date
1 1975-01-01
2 1980-02-01
9 1980-02-01
3 1985-05-01
4 1990-07-01
12 1990-07-01
16 1990-07-01
5 1990-08-01
6 1993-01-01
7 1993-09-01
67 1993-09-01'
)
dates = parse_date_time(c('1980-02', '1990-07', '1993-09'), orders = "Y-m")
#subset
df[year(df$Date) %in% year(dates) & month(df$Date) %in% month(dates), ]
The output of which is:
# A tibble: 7 × 2
ID Date
<int> <date>
1 2 1980-02-01
2 9 1980-02-01
3 4 1990-07-01
4 12 1990-07-01
5 16 1990-07-01
6 7 1993-09-01
7 67 1993-09-01
So, we load the data as you did, but do it with readr to have to automatically recognize the dates. Then we subset the rows where the year matches any of the years in the dates
object, and the month matches any of the months in the dates
object. This gives the output you wanted.
However, perhaps you only want the combinations you gave. So e.g. year 1980 is only okay if it occurs with month 2. If so, it's a bit more complicated. This could be done in many ways, but I settled on a functional approach. This is not the fastest to execute, but it is quick to write and very flexible.
#subset stricter
inclusion_func = function(x, desired_dates) {
#loop over each date
map_lgl(x, function(date) {
any(map_lgl(desired_dates, function(desired_date) {
year(date) == year(desired_date) && month(date) == month(desired_date)
})
)
})
}
df[inclusion_func(df$Date, dates), ]
The output of which is the same:
# A tibble: 7 × 2
ID Date
<int> <date>
1 2 1980-02-01
2 9 1980-02-01
3 4 1990-07-01
4 12 1990-07-01
5 16 1990-07-01
6 7 1993-09-01
7 67 1993-09-01
What the function does is loop over each date in the data frame, and loop over each year/month combination. It then checks if both the year and month matches for that particular combination. If any of the three combinations matches (hence the any
), it returns TRUE
for that row.
Upvotes: 2
Reputation: 269461
Here are some solutions. They (i) work with any dates, not just ones that are the first of the month, (ii) preserve the order of df2
in the output, (iii) are compact, i.e. one line each and do not require mentioning df2
multiple times.
1) substr This uses no packages.
subset(df2, substr(Date, 1, 7) %in% dates)
giving:
ID Date
2 2 1980-02-01
3 9 1980-02-01
5 4 1990-07-01
6 12 1990-07-01
7 16 1990-07-01
10 7 1993-09-01
11 67 1993-09-01
2) zoo::as.yearmon Another possibility is to convert both Date
and dates
to "yearmon"
class giving the same result. This code is a bit nicer but does need a package.
library(zoo)
subset(df2, as.yearmon(Date) %in% as.yearmon(dates))
Upvotes: 1
Reputation: 388817
According to the data you have shared the class of Date
column is factor. We convert them into Date
class and extract the month and year part and match it the dates
vector to get the row numbers from df
that match.
df[match(dates, format(as.Date(df$Date), "%Y-%m")), ]
# ID Date
#2 2 1980-02-01
#4 4 1990-07-01
#7 7 1993-09-01
As per the updated question, if we have multiple dates to match we can use %in%
this gives you the desired output.
df2[format(as.Date(df2$Date), "%Y-%m") %in% dates, ]
# ID Date
#2 2 1980-02-01
#3 9 1980-02-01
#5 4 1990-07-01
#6 12 1990-07-01
#7 16 1990-07-01
#10 7 1993-09-01
#11 67 1993-09-01
Upvotes: 0
Reputation:
as @eipi10 pointed out in the comments:
df[df$Date %in% as.Date(paste0(dates,"-01")), ]
this works good with my updated question.
Thanks
Upvotes: 0