Reputation: 305
I have two dataframes. The df1
one looks like:
Day Element Incident
1 2020-04-06 3101 Check incident by SOILING
2 2020-04-02 3102 Check alarm 5662
3 2020-05-21 3101 Check energy loss by METEO ERROR
4 2020-04-02 3202 Check ACDC grid
The other one, df2
, looks like this:
Day Element Incident Energy_loss
1 2020-04-06 3101 SOILING 0.05
2 2020-04-14 3101 SOILING 0.01
3 2020-05-21 3101 METEO ERROR 0.11
4 2020-06-15 3102 METEO ERROR 0.03
I would like to merge them based on the columns Day
, Element
and Incident
, so I need to find when the column Incident
in df1
contains the column Incident
of df2
. The rows where df1
doesn't have a match with df2
can be left with a Nan
in the Energy loss
column.
I've tried with the usual merge, but as one of the conditions of the merge
is by a substring, it's not working properly.
The output I expect is:
Day Element Incident Energy loss
1 2020-04-06 3101 Check incident by SOILING 0.05
2 2020-04-02 3102 Check alarm 5662 Nan
3 2020-05-21 3101 Check energy loss by METEO ERROR 0.11
4 2020-04-02 3202 Check ACDC grid Nan
Upvotes: 2
Views: 44
Reputation: 887118
We could use regex_left_join
library(dplyr)
library(fuzzyjoin)
regex_left_join(df1, df2, by = c('Day', 'Element', 'Incident')) %>%
select(Day = Day.x, Element = Element.x, Incident = Incident.x, Energy_loss)
-output
# Day Element Incident Energy_loss
#1 2020-04-06 3101 Check incident by SOILING 0.05
#2 2020-04-02 3102 Check alarm 5662 NA
#3 2020-05-21 3101 Check energy loss by METEO ERROR 0.11
#4 2020-04-02 3202 Check ACDC grid NA
df1 <- structure(list(Day = c("2020-04-06", "2020-04-02", "2020-05-21",
"2020-04-02"), Element = c(3101L, 3102L, 3101L, 3202L),
Incident = c("Check incident by SOILING",
"Check alarm 5662", "Check energy loss by METEO ERROR", "Check ACDC grid"
)), class = "data.frame", row.names = c("1", "2", "3", "4"))
df2 <- structure(list(Day = c("2020-04-06", "2020-04-14", "2020-05-21",
"2020-06-15"), Element = c(3101L, 3101L, 3101L, 3102L), Incident = c("SOILING",
"SOILING", "METEO ERROR", "METEO ERROR"), Energy_loss = c(0.05,
0.01, 0.11, 0.03)), class = "data.frame", row.names = c("1",
"2", "3", "4"))
Upvotes: 4