Reputation: 1613
I have two dataframes with two columns each (Date and data). The lenght of the columns differs. What I want to do is to delete the rows in df1 that are not in df2 by Date.
An example will clarify. These are my dataframes:
df1 = cbind(data.frame(Date = seq(as.Date("2018-11-1"), as.Date("2020-02-1"), by = "months"), stringsAsFactors = F), data.frame(Data = rnorm(16, 0, 1), stringsAsFactors = F))
Date Data
1 2018-11-01 1.09433662
2 2018-12-01 -0.27538189
3 2019-01-01 -0.19712728
4 2019-02-01 0.99852535
5 2019-03-01 -0.50760024
6 2019-04-01 -0.43127396
7 2019-05-01 0.90685965
8 2019-06-01 0.51510503
9 2019-07-01 -0.39070644
10 2019-08-01 1.27976428
11 2019-09-01 -0.63845519
12 2019-10-01 -0.05489751
13 2019-11-01 -0.87745923
14 2019-12-01 0.18082375
15 2020-01-01 0.08852416
16 2020-02-01 1.50827788
df2= cbind(data.frame(Date = df1$Date[c(1:5,7:9,11:13,15:16)]), data.frame(Data = c(1.09433662,-0.27538189, 0.99852535,-0.50760024,-0.43127396, 0.90685965,-0.39070644, 1.27976428,-0.63845519,-0.05489751,-0.87745923, 0.18082375, 1.50827788)))
Date Data
1 2018-11-01 1.09433662
2 2018-12-01 -0.27538189
3 2019-01-01 0.99852535
4 2019-02-01 -0.50760024
5 2019-03-01 -0.43127396
6 2019-05-01 0.90685965
7 2019-06-01 -0.39070644
8 2019-07-01 1.27976428
9 2019-09-01 -0.63845519
10 2019-10-01 -0.05489751
11 2019-11-01 -0.87745923
12 2020-01-01 0.18082375
13 2020-02-01 1.50827788
What I want now is that df1
is reduced to the same length as df2
by deleting the rows that are not in df2
. The rows to be deleted correspond to the missing months in df2
.
The result would be this for df1
:
#df1 where the rows corresponding to the missing months in df2 have been deleted
Date Data
1 2018-11-01 1.09433662
2 2018-12-01 -0.27538189
3 2019-01-01 -0.19712728
4 2019-02-01 0.99852535
5 2019-03-01 -0.50760024
6 2019-05-01 0.90685965
7 2019-06-01 0.51510503
8 2019-07-01 -0.39070644
9 2019-09-01 -0.63845519
10 2019-10-01 -0.05489751
11 2019-11-01 -0.87745923
12 2020-01-01 0.08852416
13 2020-02-01 1.50827788
Can anyone help me?
Thanks a lot!
Upvotes: 0
Views: 84
Reputation: 23608
semi_join
from dplyr does what you are looking for. Note that your copied the data from df2 as the output example.
library(dplyr)
semi_join(df1, df2, by = "Date")
Date Data
1 2018-11-01 0.38376758
2 2018-12-01 -0.28738352
3 2019-01-01 1.79556305
4 2019-02-01 -0.34680836
5 2019-03-01 0.57803280
6 2019-05-01 1.96801082
7 2019-06-01 0.38448708
8 2019-07-01 0.39829417
9 2019-09-01 0.94912096
10 2019-10-01 -0.04469681
11 2019-11-01 0.32008546
12 2020-01-01 1.09054839
13 2020-02-01 -1.45438502
and anti_join
shows the records that should be removed.
anti_join(df1, df2, by = "Date")
Date Data
1 2019-04-01 2.1303783
2 2019-08-01 1.6907800
3 2019-12-01 -0.8593388
Upvotes: 2