syebill
syebill

Reputation: 543

Bind rows based on conditions with data.table

I have 2 data tables that I would like to merge by binding rows. The dput's are given below:

    >dput(x)
    structure(list(NN = c("test", "test", "test", "test", "test"), 
    gpn = c("gas", "gas", "gas", "gas", "gas"), sitename = c("agip", 
    "agip", "agip", "agip", "agip"), code = c("F123", "F123", 
    "F123", "F123", "F123"), Brand = c("bp", "shell", "ionic", 
    "meta", "robo"), starttime = structure(c(1533097609, 1533118867, 
    1533183999, 1533270359, 1533356437), class = c("POSIXct", 
    "POSIXt"), tzone = ""), endtime = structure(c(1533118867, 
    1533183999, 1533270359, 1533356437, 1533443068), class = c("POSIXct", 
    "POSIXt"), tzone = ""), price = c(1.592, 1.592, 1.598, 1.598, 
    1.598), otherprice = c(1.998, 2.11, 2.114, 2.134, 1.989)), .Names = c("NN", 
"gpn", "sitename", "code", "Brand", "starttime", "endtime", "price", 
"otherprice"), row.names = c(NA, -5L), class = c("data.table", 
"data.frame"), .internal.selfref = <pointer: 0x00000000025c0788>)



>dput(y)

structure(list(NN = c("test", "test", "test", "test", "test"), 
gpn = c("gas", "gas", "gas", "gas", "gas"), sitename = c("agip", 
"agip", "agip", "agip", "agip"), code = c("F123", "F123", 
"F123", "F123", "F123"), Brand = c("dino", "dino", "dino", 
"dino", "dino"), otherprice = c(1.987, 2.456, 3.456, 1.989, 
1.99), starttime = structure(c(1533097399, 1533118999, 1533184999, 
1533271999, 1533359357), tzone = "", class = c("POSIXct", 
"POSIXt")), endtime = structure(c(1533118100, 1533182999, 
1533270000, 1533356000, 1533443000), class = c("POSIXct", 
"POSIXt"), tzone = "")), .Names = c("NN", "gpn", "sitename", 

"code", "Brand", "otherprice", "starttime", "endtime"), class = c("data.table", "data.frame"), .internal.selfref = )

I would like to bind rows from y to x such that the following matches:

pseudo code:

If(NN, code from 'x' matches NN, code from 'y') & 'x$starttime' < 'y$starttime' & 'x$endtime' > 'y$endtime'

Then get NN, code, brand, price from 'y' and the resulting frame will become:

NN    gpn    sitename   code   Brand        starttime          endtime price otherprice
 test gas     agip      F123    bp    01/08/2018 05:26 01/08/2018 11:21 1.592      1.998
 test gas     agip      F123   shell 01/08/2018 11:21 02/08/2018 05:26 1.592      2.110
 test gas     agip      F123   Ionic 02/08/2018 05:26 03/08/2018 05:25 1.598      2.114
 test gas     agip      F123    meta 03/08/2018 05:25 04/08/2018 05:20 1.598      2.134
 test gas     agip      F123    robo 04/08/2018 05:20 05/08/2018 05:24 1.598      1.989
 test gas     agip      F123    dino 01/08/2018 05:26 01/08/2018 11:21 1.592      1.987
 test gas     agip      F123    dino 01/08/2018 11:21 02/08/2018 05:26 1.592      2.456
 test gas     agip      F123    dino 02/08/2018 05:26 03/08/2018 05:25 1.598      3.456
 test gas     agip      F123    dino 03/08/2018 05:25 04/08/2018 05:20 1.598      1.989
test gas     agip       F123    dino 04/08/2018 05:20 05/08/2018 05:24 1.598      1.990

I have checked InsertRow and rbind but neither works.

Any ideas?

Thanks.

Upvotes: 1

Views: 1327

Answers (2)

Andrew Royal
Andrew Royal

Reputation: 346

To combine the datasets it seems that a good solution is to just rbind the datasets while using merge to condition the datasets on the start/end time restrictions. In the example below, the rbind function nests the merge conditioning for the start and end dates:

library('data.table')
z <- rbind(
  x = x,
  y = merge(x[, .(code, starttime.x = starttime, endtime.x = endtime, price)], y, 
            by = 'code',
            allow.cartesian = TRUE,
            all.x = TRUE)[starttime.x < starttime & endtime.x > endtime,
                          .SD,
                          .SDcols = c(paste0(names(y)), 'price')])
z

Output:

     NN gpn sitename code Brand           starttime             endtime price otherprice
1: test gas     agip F123    bp 2018-08-01 00:26:49 2018-08-01 06:21:07 1.592      1.998
2: test gas     agip F123 shell 2018-08-01 06:21:07 2018-08-02 00:26:39 1.592      2.110
3: test gas     agip F123 ionic 2018-08-02 00:26:39 2018-08-03 00:25:59 1.598      2.114
4: test gas     agip F123  meta 2018-08-03 00:25:59 2018-08-04 00:20:37 1.598      2.134
5: test gas     agip F123  robo 2018-08-04 00:20:37 2018-08-05 00:24:28 1.598      1.989
6: test gas     agip F123  dino 2018-08-01 06:23:19 2018-08-02 00:09:59 1.592      2.456
7: test gas     agip F123  dino 2018-08-02 00:43:19 2018-08-03 00:20:00 1.598      3.456
8: test gas     agip F123  dino 2018-08-03 00:53:19 2018-08-04 00:13:20 1.598      1.989
9: test gas     agip F123  dino 2018-08-04 01:09:17 2018-08-05 00:23:20 1.598      1.990

Upvotes: 1

NM_
NM_

Reputation: 1999

We can merge using dplyr, however there are few caveats due to the structure provided. From dput(), we see that y does not have the sitename variable, starttime and starttime are factors and need to be converted to posixct, and the variable brand in y has a lowercase "b" but Brand in x has an upper-case "B".

Accommodating for these observations, we can do the following:

# Change time to posixct to be consistent with x. 
# NOTE: You may have to adjust the time zone
y$starttime = as.POSIXct(y$starttime)
y$endtime = as.POSIXct(y$endtime)

# Chance 'brand' to 'Brand'
colnames(y)[colnames(y)=="brand"] = "Brand"

# Change other variabes as mentioned in commen
y$otherprice = y$price
y$price = x$price
y$sitename = x$sitename

We can create subsets which satisfy the criteria mentioned (i.e. select the subset of x and y where x$starttime < y$starttime and x$endtime > y$endtime)

# Create subsets which match criteria
x.subset = subset(x, subset = x$starttime < y$starttime & x$endtime > y$endtime)
y.subset = subset(y, subset = x$starttime < y$starttime & x$endtime > y$endtime)

Now we can bind the subsets together

# Row bind
library(dplyr)
z = dplyr::bind_rows(x.subset, y.subset)
> z
      NN gpn sitename code Brand           starttime             endtime price otherprice
 1: test gas     agip F123    bp 2018-08-01 00:26:49 2018-08-01 06:21:07 1.592      1.998
 2: test gas     agip F123 shell 2018-08-01 06:21:07 2018-08-02 00:26:39 1.592      2.110
 3: test gas     agip F123 ionic 2018-08-02 00:26:39 2018-08-03 00:25:59 1.598      2.114
 4: test gas     agip F123  meta 2018-08-03 00:25:59 2018-08-04 00:20:37 1.598      2.134
 5: test gas     agip F123  robo 2018-08-04 00:20:37 2018-08-05 00:24:28 1.598      1.989
 6: test gas     agip F123  dino 2018-08-01 00:33:19 2018-08-01 06:08:20 1.592      1.987
 7: test gas     agip F123  dino 2018-08-01 06:23:19 2018-08-02 00:09:59 1.592      2.456
 8: test gas     agip F123  dino 2018-08-02 00:43:19 2018-08-03 00:20:00 1.598      3.456
 9: test gas     agip F123  dino 2018-08-03 00:53:19 2018-08-04 00:13:20 1.598      1.989
10: test gas     agip F123  dino 2018-08-04 01:09:17 2018-08-05 00:23:20 1.598      1.990

Upvotes: 1

Related Questions