Reputation: 59
I have two dataframes with distinct columns between them, as follows:
DF1
month mbs rec_n vol_n rec_indice rec_mm
2021-05-01 hsp 164.84724 102.44143 1.5 0.6
2021-05-01 tvc 106.2428 81.89729 16.4 2.9
2021-07-01 tvc 111.51844 90.37325 -3.1 -3.4
2022-01-01 outro 74.33283 70.90493 0.8 1.2
DF2
month mbs prod_n prod_n_mm
2021-05-01 inds 164.84724 102.44143
2021-05-01 indf 106.2428 81.89729
2021-07-01 inds 111.51844 90.37325
2022-01-01 indf 74.33283 70.90493
I want to copy the columns and values of the respective columns of DF1 and 2, comparing the column month and mbs if they are equal. The month column should only be compared to the month, not the day. If the month of the "Month" column is the same and the value of the mbs column is also the same, copy the value. If these conditions are not met, set the value to zero, as shown in DF3
DF3
month mbs x y rec_n vol_n rec_indice rec_mm prod_n pro_n_mm
2021-05-06 hsp value value 164.84724 102.44143 1.5 0.6 0 0
2021-05-30 hsp value value 164.84724 102.44143 1.5 0.6 0 0
2021-05-18 tvc value value 106.2428 81.89729 16.4 2.9 0 0
2022-01-21 outro value value 74.33283 70.90493 0.8 1.2 0 0
2022-01-30 outro value value 74.33283 70.90493 0.8 1.2 0 0
2021-05-06 inds value value 0 0 0 0 164.84724 102.44143
2021-07-30 idnf value value 0 0 0 0 0 0
2021-07-22 inds value value 0 0 0 0 111.51844 90.37325
2022-01-21 indf value value 0 0 0 0 74.33283 70.90493
Remembering that the column x and y are columns that already exist in DF3. So I just want to complete DF3 with the columns of DF1 and DF2.
Upvotes: 0
Views: 473
Reputation: 59
left_join
did the trick. I know there is a more elegant solution, but I created a new column for the two dataframes with just month and year. Then I used left_join(df3, df2, by = c("month_year", "mbs"))
Upvotes: 0
Reputation: 173803
It sounds like you are looking for a full join:
library(tidyverse)
full_join(DF1, DF2, by = c('month', 'mbs'))
#> month mbs rec_n vol_n rec_indice rec_mm prod_n prod_n_mm
#> 1 2021-05-01 hsp 164.84724 102.44143 1.5 0.6 NA NA
#> 2 2021-05-01 tvc 106.24280 81.89729 16.4 2.9 NA NA
#> 3 2021-07-01 tvc 111.51844 90.37325 -3.1 -3.4 NA NA
#> 4 2022-01-01 outro 74.33283 70.90493 0.8 1.2 NA NA
#> 5 2021-05-01 inds NA NA NA NA 164.84724 102.44143
#> 6 2021-05-01 indf NA NA NA NA 106.24280 81.89729
#> 7 2021-07-01 inds NA NA NA NA 111.51844 90.37325
#> 8 2022-01-01 indf NA NA NA NA 74.33283 70.90493
If you really want zeros instead of NA
(which might lead to problems later) then you can do:
full_join(DF1, DF2, by = c('month', 'mbs')) %>%
replace_na(list(rec_n = 0, vol_n = 0, rec_indice = 0, rec_mm = 0,
prod_n = 0, prod_n_mm = 0))
#> month mbs rec_n vol_n rec_indice rec_mm prod_n prod_n_mm
#> 1 2021-05-01 hsp 164.84724 102.44143 1.5 0.6 0.00000 0.00000
#> 2 2021-05-01 tvc 106.24280 81.89729 16.4 2.9 0.00000 0.00000
#> 3 2021-07-01 tvc 111.51844 90.37325 -3.1 -3.4 0.00000 0.00000
#> 4 2022-01-01 outro 74.33283 70.90493 0.8 1.2 0.00000 0.00000
#> 5 2021-05-01 inds 0.00000 0.00000 0.0 0.0 164.84724 102.44143
#> 6 2021-05-01 indf 0.00000 0.00000 0.0 0.0 106.24280 81.89729
#> 7 2021-07-01 inds 0.00000 0.00000 0.0 0.0 111.51844 90.37325
#> 8 2022-01-01 indf 0.00000 0.00000 0.0 0.0 74.33283 70.90493
Created on 2022-09-27 with reprex v2.0.2
Upvotes: 2