Lynn
Lynn

Reputation: 4398

How can I join two tables horizontally and match 2 different column names in R?

I have two dataframes:

data1:

ID               DateTimeUTC
 A               12/4/2019 11:30:30 PM
 A               12/4/2019 11:30:31 PM
 B               12/5/2019 11:31:00 PM
 B               12/5/2019 11:31:01 PM
 C               12/5/2019 11:31:02 PM

and data2:

 Message         DateTimeUTC
 A               12/4/2019 11:30:30 PM
 A               12/4/2019 11:30:31 PM
 B               12/5/2019 11:31:00 PM
 B               12/5/2019 11:31:01 PM

I would like to have

ID              DateTimeUTC               Message              DateTimeUTC
A               12/4/2019 11:30:30 PM      A           12/4/2019 11:30:30 PM
A               12/4/2019 11:30:31 PM      A           12/4/2019 11:30:31 PM
B               12/5/2019 11:31:00 PM      B           12/4/2019 11:31:00 PM
B               12/5/2019 11:31:01 PM      B           12/4/2019 11:31:01 PM

I wish to only show matching IDs and Messages. I have performed an inner join, but it is giving me duplicates, and it erases one of my column names.

 library('dplyr')
 inner_join(data1,  data2, by = c("ID" = "Message"))  

Goal: Can someone show me how to do an rbind to get the above outcome?

##pseudo_code:
 rbind(data1,data2, order_by ID & Message)

Upvotes: 0

Views: 438

Answers (2)

Colin C
Colin C

Reputation: 76

If you want to bind these two data sets together I would use cbind.

A caution though is is that this assumes that the data in each dataset is ordered correctly (i.e. the first 'A' in data1 actually goes with the first 'A' in data2).

library(tidyverse)

data1 <- tibble(ID = c("A", "A", "B", "B", "C"),
                    DateTimeUTC = c("12/4/2019 11:30:30 PM",
                                    "12/4/2019 11:30:31 PM",
                                    "12/5/2019 11:31:00 PM",
                                    "12/5/2019 11:31:01 PM",
                                    "12/5/2019 11:31:02 PM"))

data2 <- tibble(Message = c("A", "A", "B", "B"),
                    DateTimeUTC = c("12/4/2019 11:30:30 PM",
                                    "12/4/2019 11:30:31 PM",
                                    "12/5/2019 11:31:00 PM",
                                    "12/5/2019 11:31:01 PM"))
# remove 'C' from the list
data1 <- data1 %>% filter(ID != "C")

# bind datasets togethers as per example above.
data_bind <- cbind(data1, data2)

Upvotes: 0

Nareman Darwish
Nareman Darwish

Reputation: 1261

Actually the idea of inner_join is correct, the problem is that instead of joining only on "ID" = "Message" you should also consider DateTimeUTC. So it is joining on 2 conditions as follows;

library(dplyr)

df1 <-
  data.frame(
    ID = c("A", "A", "B", "B", "C"),
    DateTimeUTC = c("12/4/2019 11:30:30 PM", "12/4/2019 11:30:31 PM", "12/5/2019 11:31:00 PM", 
                    "12/5/2019 11:31:01 PM", "12/5/2019 11:31:02 PM"),
    stringsAsFactors = FALSE
  )

df2 <-
  data.frame(
    Message = c("A", "A", "B", "B"),
    DateTimeUTC = c("12/4/2019 11:30:30 PM", "12/4/2019 11:30:31 PM", 
                    "12/5/2019 11:31:00 PM", "12/5/2019 11:31:01 PM"),
    stringsAsFactors = FALSE
  )

df1 %>%
  inner_join(df2, by = c("ID" = "Message", "DateTimeUTC" = "DateTimeUTC"))

# ID           DateTimeUTC
# A 12/4/2019 11:30:30 PM
# A 12/4/2019 11:30:31 PM
# B 12/5/2019 11:31:00 PM
# B 12/5/2019 11:31:01 PM

Upvotes: 1

Related Questions