Reputation:
I have got two data.frames as follow:
df1 = read.table(text = 'Date ID
1980-12-01 2
1982-11-01 1
1990-01-01 6
1993-07-01 9', header = TRUE)
df2 = read.table(text = 'Date Var
1980-12-17 8
1982-11-07 9
1990-01-19 10
1993-07-20 22', header = TRUE)
I need to merge df1 with df2 by the column Date but by only considering the Year and Month, also because otherwise the merge does not work (because the Days are different).
My output should be like this:
Date ID Var
1980-12 2 8
1982-11 1 9
1990-01 6 10
1993-07 9 22
I considered to remove the Days in the Date columns but I would like to know if there is a quicker way to do this. I also checked and it seems that no one asked this question before.
Thanks
#UPDATE
As suggested in the comments, what if I have got multiple observations in the same Year and Month in one data.frame?
e.g.
df1 = read.table(text = 'Date ID #as 1st Case
1980-12-01 2
1982-11-01 1
1990-01-01 6
1993-07-01 9', header = TRUE)
df3 = read.table(text = 'Date Var
1980-12-17 8
1980-12-29 4
1980-12-30 1
1982-11-07 9
1982-11-12 1
1990-01-19 10
1990-01-22 21
1993-07-20 22
1993-07-26 12', header = TRUE)
My new output should be:
Date Var ID
1980-12 8 2
1980-12 4 2
1980-12 1 2
1982-11 9 1
1982-11 1 1
1990-01 10 6
1990-01 21 6
1993-07 22 9
1993-07 12 9
#
UPDATE
What if I have got multiple observations in the same Year and Month in both the data.frames?
df4 = read.table(text = 'Date ID
1980-12-01 1
1980-12-01 2
1982-11-01 3
1982-11-01 4
1990-01-01 5
1990-01-01 6
1993-07-01 7
1993-07-01 8', header = TRUE)
df3 = read.table(text = 'Date Var #as 2nd Case
1980-12-17 8
1980-12-29 4
1980-12-30 1
1982-11-07 9
1982-11-12 1
1990-01-19 10
1990-01-22 21
1993-07-20 22
1993-07-26 12', header = TRUE)
The new output should be:
Date Var ID
1980-12-17 8 1
1980-12-17 8 2
1980-12-29 4 1
1980-12-29 4 2
1980-12-30 1 1
1980-12-30 1 2
1982-11-07 9 3
1982-11-07 9 4
1982-11-12 1 3
1982-11-12 1 4
1990-01-19 10 5
1990-01-19 10 6
1990-01-22 21 5
1990-01-22 21 6
1993-07-20 22 7
1993-07-20 22 8
1993-07-26 12 7
1993-07-26 12 8
Upvotes: 5
Views: 10323
Reputation: 2076
You need to format the date into month and year format. then you do a regular merge.
df1$my=format(as.Date(df1$Date), "%Y-%m")
df2$my=format(as.Date(df2$Date), "%Y-%m")
merge(df1,df2, by='my')
For multiple observation across same year and month, this would still work.
df1$my=format(as.Date(df1$Date), "%Y-%m")
df3$my=format(as.Date(df3$Date), "%Y-%m")
merge(df1,df3, by='my')
my Date.x ID Date.y Var
1 1980-12 1980-12-01 2 1980-12-17 8
2 1980-12 1980-12-01 2 1980-12-29 4
3 1980-12 1980-12-01 2 1980-12-30 1
4 1982-11 1982-11-01 1 1982-11-07 9
5 1982-11 1982-11-01 1 1982-11-12 1
6 1990-01 1990-01-01 6 1990-01-19 10
7 1990-01 1990-01-01 6 1990-01-22 21
8 1993-07 1993-07-01 9 1993-07-20 22
9 1993-07 1993-07-01 9 1993-07-26 12
For update 3, one can use
merge(df4,df3, by='my', all = T)
Upvotes: 5
Reputation: 886948
We convert the 'Date' to Date
class, format
to year-month format in both the dataset and merge
it together.
merge(transform(df1, Date = format(as.Date(Date), "%Y-%m")),
transform(df2, Date = format(as.Date(Date), "%Y-%m")))
# Date ID Var
#1 1980-12 2 8
#2 1982-11 1 9
#3 1990-01 6 10
#4 1993-07 9 22
Or if there are many datasets, place it in a list
and do the transform and merge
Reduce(function(...) merge(...),
lapply(mget(paste0("df", 1:2)), transform, Date = format(as.Date(Date), "%Y-%m")))
Upvotes: 2