user7109363
user7109363

Reputation:

Merge data.frame by Date (year and month)

1st Case

I have got two data.frames as follow:

df1 = read.table(text = 'Date  ID
1980-12-01  2
1982-11-01  1
1990-01-01  6
1993-07-01  9', header = TRUE)

df2 = read.table(text = 'Date  Var
1980-12-17  8
1982-11-07  9
1990-01-19  10
1993-07-20  22', header = TRUE)

I need to merge df1 with df2 by the column Date but by only considering the Year and Month, also because otherwise the merge does not work (because the Days are different).

My output should be like this:

Date    ID  Var
1980-12  2    8
1982-11  1    9
1990-01  6   10
1993-07  9   22

I considered to remove the Days in the Date columns but I would like to know if there is a quicker way to do this. I also checked and it seems that no one asked this question before.

Thanks

#

UPDATE

2nd Case

As suggested in the comments, what if I have got multiple observations in the same Year and Month in one data.frame?

e.g.

df1 = read.table(text = 'Date  ID #as 1st Case
1980-12-01  2
1982-11-01  1
1990-01-01  6
1993-07-01  9', header = TRUE)

df3 = read.table(text = 'Date  Var
    1980-12-17  8
    1980-12-29  4
    1980-12-30  1
    1982-11-07  9
    1982-11-12  1
    1990-01-19  10
    1990-01-22  21
    1993-07-20  22
    1993-07-26  12', header = TRUE)

My new output should be:

Date   Var    ID
1980-12  8     2
1980-12  4     2
1980-12  1     2
1982-11  9     1
1982-11  1     1
1990-01  10    6
1990-01  21    6
1993-07  22    9
1993-07  12    9
#

UPDATE

3rd Case

What if I have got multiple observations in the same Year and Month in both the data.frames?

df4 = read.table(text = 'Date  ID 
    1980-12-01  1
    1980-12-01  2
    1982-11-01  3
    1982-11-01  4
    1990-01-01  5
    1990-01-01  6
    1993-07-01  7
    1993-07-01  8', header = TRUE)

df3 = read.table(text = 'Date  Var      #as 2nd Case
    1980-12-17  8
    1980-12-29  4
    1980-12-30  1
    1982-11-07  9
    1982-11-12  1
    1990-01-19  10
    1990-01-22  21
    1993-07-20  22
    1993-07-26  12', header = TRUE)

The new output should be:

    Date       Var  ID
    1980-12-17   8   1
    1980-12-17   8   2
    1980-12-29   4   1
    1980-12-29   4   2
    1980-12-30   1   1
    1980-12-30   1   2
    1982-11-07   9   3
    1982-11-07   9   4
    1982-11-12   1   3
    1982-11-12   1   4
    1990-01-19   10  5
    1990-01-19   10  6
    1990-01-22   21  5
    1990-01-22   21  6
    1993-07-20   22  7
    1993-07-20   22  8
    1993-07-26   12  7
    1993-07-26   12  8

Upvotes: 5

Views: 10323

Answers (2)

Chirayu Chamoli
Chirayu Chamoli

Reputation: 2076

You need to format the date into month and year format. then you do a regular merge.

df1$my=format(as.Date(df1$Date), "%Y-%m")
df2$my=format(as.Date(df2$Date), "%Y-%m")
merge(df1,df2, by='my')

For multiple observation across same year and month, this would still work.

df1$my=format(as.Date(df1$Date), "%Y-%m")
df3$my=format(as.Date(df3$Date), "%Y-%m")
merge(df1,df3, by='my')
      my     Date.x ID     Date.y Var
1 1980-12 1980-12-01  2 1980-12-17   8
2 1980-12 1980-12-01  2 1980-12-29   4
3 1980-12 1980-12-01  2 1980-12-30   1
4 1982-11 1982-11-01  1 1982-11-07   9
5 1982-11 1982-11-01  1 1982-11-12   1
6 1990-01 1990-01-01  6 1990-01-19  10
7 1990-01 1990-01-01  6 1990-01-22  21
8 1993-07 1993-07-01  9 1993-07-20  22
9 1993-07 1993-07-01  9 1993-07-26  12

For update 3, one can use

merge(df4,df3, by='my', all = T)

Upvotes: 5

akrun
akrun

Reputation: 886948

We convert the 'Date' to Date class, format to year-month format in both the dataset and merge it together.

merge(transform(df1, Date = format(as.Date(Date), "%Y-%m")), 
             transform(df2, Date = format(as.Date(Date), "%Y-%m")))
#     Date ID Var
#1 1980-12  2   8
#2 1982-11  1   9
#3 1990-01  6  10
#4 1993-07  9  22

Or if there are many datasets, place it in a list and do the transform and merge

Reduce(function(...) merge(...), 
    lapply(mget(paste0("df", 1:2)), transform, Date = format(as.Date(Date), "%Y-%m")))

Upvotes: 2

Related Questions