Merge two data frames based on multiple columns in R

Question

I have two data frames looking like that

data frame 1:

P.X    value
OOPA     5
POKA    4
JKIO    3
KOPP    1

data frame 2:

P.X.1   P.X.2   P.X.3  P.X.4    mass
        JKIO    UIX    HOP       56
                CX     OOPA      44
EDD             POKA             13
KOPP    FOSI                     11

and I want to merge the two data files based on the df1 P.X and df2 P.X.1,P.X.2,P.X.3,P.X.4. So if it the JKIO in P.X.2. appears in the P.X one then merge them in a new data frame in the same row JKIO, 3, 56 as below:

data frame new:

P.X    value  mass
OOPA    5      44
POKA    4      13
JKIO    3      56
KOPP    1      11

Do you know how can I do it maybe with

merge(df1,df2 by(P.X == P.X.1 | P.X.2 | P.X.3 | P.X.4)

?

jazzurro · Accepted Answer

The following is one way to achieve your goal. You want to convert df2 to a long-format data and get rows that have more than 1 character. Once you have this data, you merge df1 with the updated df2.

library(dplyr)
library(tidyr)

left_join(df1,
          pivot_longer(df2, cols = P.X.1:P.X.4, names_to = "foo",
                       values_to = "P.X") %>% filter(nchar(P.X) > 0),
          by = "P.X") %>% 
select(-foo)

   P.X value mass
1 OOPA     5   44
2 POKA     4   13
3 JKIO     3   56
4 KOPP     1   11

DATA

df1 <- structure(list(P.X = c("OOPA", "POKA", "JKIO", "KOPP"), value = c(5L, 
4L, 3L, 1L)), class = "data.frame", row.names = c(NA, -4L))

df2 <- structure(list(P.X.1 = c("", "", "EDD", "KOPP"), P.X.2 = c("JKIO", 
"", "", "FOSI"), P.X.3 = c("UIX", "CX", "POKA", ""), P.X.4 = c("HOP", 
"OOPA", "", ""), mass = c(56, 44, 13, 11)), row.names = c(NA, 
-4L), class = c("tbl_df", "tbl", "data.frame"))

Merge two data frames based on multiple columns in R

Answers (2)

Related Questions