Reputation: 2763
I have the following data tables:
> dat1
ID1 ID2 distance
1: 1 1117 3022.22
2: 1 386 16107.74
3: 1 920 17327.00
4: 1 91 24691.42
5: 1 227 29459.34
> dat2
ID1 ID2 common_date_begin common_date_end
1: 2 1 2000-01-01 2006-12-03
2: 3 1 2000-05-01 2006-12-03
3: 3 2 2000-05-01 2006-12-03
4: 1117 1 2000-01-01 2006-12-03
5: 4 2 2000-01-01 2006-12-03
dat1
shows some meteorological stations (ID1
), their nearest neighbor stations (ID2
), and the distance between them.
dat2
shows the common periods of data availability for each pair of stations. The problem is, some stations are ID1
in dat1
and are ID2
in dat2
. For example, stations 1
and 1117
.
I want to join dat1
and dat2
using ID1
and ID2
just as pairs, with no specified order. I am doing:
dat1[dat2, on=.(ID1, ID2), (cols) := mget(paste0("i.", cols))][,
diff_days := common_date_end - common_date_begin][]
ID1 ID2 distance common_date_begin common_date_end diff_days
1: 1 1117 3022.22 <NA> <NA> NA days
2: 1 386 16107.74 <NA> <NA> NA days
3: 1 920 17327.00 <NA> <NA> NA days
4: 1 91 24691.42 <NA> <NA> NA days
5: 1 227 29459.34 <NA> <NA> NA days
which doesn't work, because 1
and 1117
are under distinct identifiers.
However, the result I am looking for is:
ID1 ID2 distance common_date_begin common_date_end diff_days
1: 1 1117 3022.22 2000-01-01 2006-12-03 2528 days
2: 1 386 16107.74 <NA> <NA> NA days
3: 1 920 17327.00 <NA> <NA> NA days
4: 1 91 24691.42 <NA> <NA> NA days
5: 1 227 29459.34 <NA> <NA> NA days
Is this achievable using data.table
?
Here is the data to reproduce my problem:
library(data.table)
dat1 <- read.table(text="ID1 ID2 distance
1 1117 3022.22
1 386 16107.74
1 920 17327.00
1 91 24691.42
1 227 29459.34", header=T, stringsAsFactors=F)
dat1 <- data.table(dat1)
dat2 <- read.table(text="ID1 ID2 common_date_begin common_date_end
2 1 2000-01-01 2006-12-03
3 1 2000-05-01 2006-12-03
3 2 2000-05-01 2006-12-03
1117 1 2000-01-01 2006-12-03
4 2 2000-01-01 2006-12-03", header=T, stringsAsFactors=F)
dat2 <- data.table(dat2)
# convert character to dates
cols <- c("common_date_begin", "common_date_end")
dat2[, (cols) := lapply(.SD, as.Date), .SDcols = cols]
Upvotes: 1
Views: 49
Reputation: 25223
You can sort the columns first before joining:
dat1[, paste0("ID", 1L:2L) := .(pmin(ID1, ID2), pmax(ID1, ID2))]
dat2[, paste0("ID", 1L:2L) := .(pmin(ID1, ID2), pmax(ID1, ID2))]
Upvotes: 3