stat77
stat77

Reputation: 113

Joining three tables in R each having different keys

I have three tables: CUSTOMER (PK: CustomerNum), RESERVATION (PK: CustomerNum and TripID NOT PK), and TRIP (PK: TripID). Trying to use purr:reduce.

Tried the following code.

tables <- list(customer, reservation, trip)
reduce(tables, inner_join, by = c("CustomerNum", "TripID"))

Error: by can't contain join column TripID which is missing from LHS

Upvotes: 0

Views: 109

Answers (1)

akrun
akrun

Reputation: 886948

We can use a for loop when the by columns are different in each step

grp <- c("CustomerNum", "TripID")
out <- customer
for(i in seq_along(grp)) {
     out <- inner_join(out, tables[[i+1]], by = grp[i])
  }

out
#  CustomerNum        val TripID newInfo
#1           1 -0.5458808      4       *
#2           2  0.5365853      2    ****

then select the columns of interest


Note that if we don't provide the by, it would automatically pick up the by based on the availability of common column name based on the reproducible example below. As the OP didn't provide any reproducible example, it is unclear about the situation

reduce(tables, inner_join)
#Joining, by = "CustomerNum"  #### <-----
#Joining, by = "TripID"       #### <-----
#  CustomerNum        val TripID newInfo
#1           1 -0.5458808      4       *
#2           2  0.5365853      2    ****

data

set.seed(24)
customer <- data.frame(CustomerNum = 1:5, val = rnorm(5))
reservation <- data.frame(CustomerNum = 1:3, TripID = c(4, 2, 8))
trip <- data.frame(TripID = c(4, 9, 2), newInfo = c("*", "**", "****"))
tables <- list(customer, reservation, trip)

Upvotes: 1

Related Questions