Reputation: 4398
I have two dataframes:
data1:
ID DateTimeUTC
A 12/4/2019 11:30:30 PM
A 12/4/2019 11:30:31 PM
B 12/5/2019 11:31:00 PM
B 12/5/2019 11:31:01 PM
C 12/5/2019 11:31:02 PM
and data2:
Message DateTimeUTC
A 12/4/2019 11:30:30 PM
A 12/4/2019 11:30:31 PM
B 12/5/2019 11:31:00 PM
B 12/5/2019 11:31:01 PM
I would like to have
ID DateTimeUTC Message DateTimeUTC
A 12/4/2019 11:30:30 PM A 12/4/2019 11:30:30 PM
A 12/4/2019 11:30:31 PM A 12/4/2019 11:30:31 PM
B 12/5/2019 11:31:00 PM B 12/4/2019 11:31:00 PM
B 12/5/2019 11:31:01 PM B 12/4/2019 11:31:01 PM
I wish to only show matching IDs and Messages. I have performed an inner join, but it is giving me duplicates, and it erases one of my column names.
library('dplyr')
inner_join(data1, data2, by = c("ID" = "Message"))
Goal: Can someone show me how to do an rbind to get the above outcome?
##pseudo_code:
rbind(data1,data2, order_by ID & Message)
Upvotes: 0
Views: 438
Reputation: 76
If you want to bind these two data sets together I would use cbind
.
A caution though is is that this assumes that the data in each dataset is ordered correctly (i.e. the first 'A' in data1 actually goes with the first 'A' in data2).
library(tidyverse)
data1 <- tibble(ID = c("A", "A", "B", "B", "C"),
DateTimeUTC = c("12/4/2019 11:30:30 PM",
"12/4/2019 11:30:31 PM",
"12/5/2019 11:31:00 PM",
"12/5/2019 11:31:01 PM",
"12/5/2019 11:31:02 PM"))
data2 <- tibble(Message = c("A", "A", "B", "B"),
DateTimeUTC = c("12/4/2019 11:30:30 PM",
"12/4/2019 11:30:31 PM",
"12/5/2019 11:31:00 PM",
"12/5/2019 11:31:01 PM"))
# remove 'C' from the list
data1 <- data1 %>% filter(ID != "C")
# bind datasets togethers as per example above.
data_bind <- cbind(data1, data2)
Upvotes: 0
Reputation: 1261
Actually the idea of inner_join is correct, the problem is that instead of joining only on "ID" = "Message" you should also consider DateTimeUTC. So it is joining on 2 conditions as follows;
library(dplyr)
df1 <-
data.frame(
ID = c("A", "A", "B", "B", "C"),
DateTimeUTC = c("12/4/2019 11:30:30 PM", "12/4/2019 11:30:31 PM", "12/5/2019 11:31:00 PM",
"12/5/2019 11:31:01 PM", "12/5/2019 11:31:02 PM"),
stringsAsFactors = FALSE
)
df2 <-
data.frame(
Message = c("A", "A", "B", "B"),
DateTimeUTC = c("12/4/2019 11:30:30 PM", "12/4/2019 11:30:31 PM",
"12/5/2019 11:31:00 PM", "12/5/2019 11:31:01 PM"),
stringsAsFactors = FALSE
)
df1 %>%
inner_join(df2, by = c("ID" = "Message", "DateTimeUTC" = "DateTimeUTC"))
# ID DateTimeUTC
# A 12/4/2019 11:30:30 PM
# A 12/4/2019 11:30:31 PM
# B 12/5/2019 11:31:00 PM
# B 12/5/2019 11:31:01 PM
Upvotes: 1