H.Barras
H.Barras

Reputation: 3

Filling table column by matching several rows from different tables with data.table

I need help finding a solution to the following problem:

Here is an example data set:

library(data.table)
x_coord <- rep(sort(rep(c(1:3),3)),2)
y_coord <- rep(c(1:3),6)
time_info <- c(rep(strptime("201701010000", tz = "UTC", format = "%Y%m%d%H%M"),9), rep(strptime("201701010005", tz = "UTC", format = "%Y%m%d%H%M"),9))

table1 <- data.table(x = x_coord, y = y_coord, time = time_info)

table2 <- data.table(x = c(1,1,3,2), y = c(1,1,3,1),time = rep(time_info[1], 4), values = c(3,5,8,6))
# table2 has one unique time-value

The aim is to fill in table1 with the values from table2 so that it looks like this:

table3 <- table1
for (i in c(1:nrow(table2))) {
table3[x == table2$x[i] & y == table2$y[i] & time == table2$time[i],"values" := .(table2$values[i])]
}

There are several constraints:

  1. table1 does not contain all coordinates from table2 and the reverse. Each coordinates/time combination has only one corresponding value so there is no issue with multiple values per coordinate/time.

  2. I can't do it manually, since "table1" has in reality 2*10^7 rows and I want to get the values from almost 100'000 "table2"s which each have the size of about 10000 rows and which come from a different file each.

I've tried merging but then when it's Looping, it adds a new "values"-column at each Iteration (and on top of that it's taking really long). I've also tried matching with

table1[time == table2$time[1] & paste0(x,y) %in% paste0(table2$x,table2$y), "values" := .(table2$values)] 

but then I am not sure that the right value is assigned to the right coordinate.

I hope my question is clear, sorry if not! Thank you very much for helping.

Upvotes: 0

Views: 583

Answers (2)

Jaap
Jaap

Reputation: 83225

A join with data.table:

table1[table2, on = .(x, y, time), values := values][]

which gives:

    x y                time values
 1: 1 1 2017-01-01 01:00:00      5
 2: 1 2 2017-01-01 01:00:00     NA
 3: 1 3 2017-01-01 01:00:00     NA
 4: 2 1 2017-01-01 01:00:00      6
 5: 2 2 2017-01-01 01:00:00     NA
 6: 2 3 2017-01-01 01:00:00     NA
 7: 3 1 2017-01-01 01:00:00     NA
 8: 3 2 2017-01-01 01:00:00     NA
 9: 3 3 2017-01-01 01:00:00      8
10: 1 1 2017-01-01 01:05:00     NA
11: 1 2 2017-01-01 01:05:00     NA
12: 1 3 2017-01-01 01:05:00     NA
13: 2 1 2017-01-01 01:05:00     NA
14: 2 2 2017-01-01 01:05:00     NA
15: 2 3 2017-01-01 01:05:00     NA
16: 3 1 2017-01-01 01:05:00     NA
17: 3 2 2017-01-01 01:05:00     NA
18: 3 3 2017-01-01 01:05:00     NA

Upvotes: 1

Andrew Chisholm
Andrew Chisholm

Reputation: 6567

If you're prepared to use the dplyr package you could do this.

library(dplyr)
table3 = table1 %>% left_join(table2)

Upvotes: 0

Related Questions