Madiba
Madiba

Reputation: 23

How to merge two data sets by IDs based on another variable in R

I want to merge two data sets with ID. Data2 is a subset of the larger data set (data1) and it was created by removing Ids which were tagged 1 in data1. In the sample data, the two data sets have the same Ids until ID 426. With ID 427 when the tag is 1 in the first data, the second data removes that entry and continues with sequential ID. Thus Id 427 in the second data is 428 in the first. Likewise ID 1865 in the second data is 1867 in the first. How could I merge this data sets? I have provided a single sample data on-line. The first data has variables id, date and tag while the second has id1 and date1

> dput(data1)
structure(list(id = c(426L, 427L, 428L, 429L, 430L, 431L, 432L, 
1865L, 1866L, 1867L, 1868L, 1869L, 1870L, 1871L, 2388L, 2389L, 
2390L, 2391L, 2965L, 2966L, 2967L, 2968L, 2969L, 2970L), date = structure(c(11250, 
7308, 12436, 9919, 13372, 9526, 8232, 7306, 9872, 7398, 10332, 
12967, 14288, 14053, 7311, 10268, 7772, 14477, 7309, 10228, 8917, 
10216, 10873, 8065), class = "Date"), tag = c(0L, 1L, 0L, 0L, 
0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 1L, 0L, 
0L, 0L, 0L, 0L)), datalabel = "", time.stamp = "24 Jan 2015 22:01", .Names = c("id", 
"date", "tag"), formats = c("%8.0g", "%tdDD_mon_CCYY", "%8.0g"
), types = c(252L, 254L, 251L), val.labels = c("", "", ""), var.labels = c("", 
"", ""), row.names = c("1", "2", "3", "4", "5", "6", "7", "8", 
"9", "10", "11", "12", "13", "14", "15", "16", "17", "18", "19", 
"20", "21", "22", "23", "24"), version = 12L, class = "data.frame")
> 



> dput(data2)
structure(list(id1 = c(426L, 427L, 428L, 429L, 430L, 431L, 432L, 
1865L, 1866L, 1867L, 1868L, 1869L, 1870L, 1871L, 2388L, 2389L, 
2390L, 2391L, 2965L, 2966L, 2967L, 2968L, 2969L, 2970L), date1 = structure(c(11250, 
12436, 9919, 13372, 9526, 8232, 13787, 7398, 10332, 12967, 14288, 
14053, 11620, 11426, 14477, 11464, 9029, 11875, 10873, 8065, 
11233, 13848, 10204, 9535), class = "Date")), datalabel = "", time.stamp = "24 Jan 2015 22:00", .Names = c("id1", 
"date1"), formats = c("%8.0g", "%tdDD_mon_CCYY"), types = c(252L, 
254L), val.labels = c("", ""), var.labels = c("", ""), row.names = c("1", 
"2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", 
"14", "15", "16", "17", "18", "19", "20", "21", "22", "23", "24"
), version = 12L, class = "data.frame")

Upvotes: 2

Views: 558

Answers (2)

tomtom
tomtom

Reputation: 259

As far as I could read from your comments on other posts you want to select those rows from data1 that have a tag of 0. This can be achieved using

data2 <- data1[data1$tag == 0,]

Upvotes: 0

Hack-R
Hack-R

Reputation: 23216

Is this what you're looking for?

data <- sqldf("select a.*, b.* from data1 a left join data2 b on a.date = b.date1")

or possibly:

data <- sqldf("select a.id, a.date from data1 a join data2 b on a.date = b.date1")

Upvotes: 1

Related Questions