CptNemo
CptNemo

Reputation: 6755

data.table: merge() one-key table with two-keys table

I refer also to this questions that is now almost 3 years old. I have a similar problem.

I have these three tables:

require(data.table)

profile <- data.table(id = seq(11,20))
setkey(profile, "id")

post <- data.table(id = seq(1,10))
setkey(post, "id")

comment <- data.table(post_id = seq(1,10),
                      profile_id = seq(11,20))
setkeyv(comment, c("post_id","profile_id"))

Now I want to merge profile and post with comment into two different tables. How do I specify that the key to match in comment is profile_id and in post is post_id? Should I respecify how tables are built?

Upvotes: 4

Views: 240

Answers (2)

MichaelChirico
MichaelChirico

Reputation: 34703

I don't think I understand your question. What do you expect exactly?

Naming of the keys shouldn't matter--in x[y] the first keys of x and y are matched (and then the second keys, etc.).

What about the following is different from what you expect? I've edited your example tables to make it clear exactly what's happening:

profile <- data.table(id = seq(11,20), v_prof = sample(100:109), key = "id")

post <- data.table(id = seq(1,10), v_pos t= sample(200:209), key = "id")

comment <- data.table(post_id = seq(1, 10),
                      profile_id = seq(11, 20))

setkey(comment, post_id)[post]
#(the reverse would work too: post[setkey(comment, post_id)] )
    post_id profile_id v_post
 1:       1         11    207
 2:       2         12    208
 3:       3         13    201
 4:       4         14    205
 5:       5         15    206
 6:       6         16    200
 7:       7         17    202
 8:       8         18    203
 9:       9         19    209
10:      10         20    204

setkey(comment, profile_id)[profile]
    post_id profile_id v_prof
 1:       1         11    107
 2:       2         12    101
 3:       3         13    109
 4:       4         14    102
 5:       5         15    103
 6:       6         16    104
 7:       7         17    100
 8:       8         18    108
 9:       9         19    105
10:      10         20    106

Upvotes: 1

Helix123
Helix123

Reputation: 3687

Unfortunately, merge by.x= by.y= as in merge for data.frame is not implemented for data.tables yet. It will be adressed in the next release 1.9.6. See here: https://github.com/Rdatatable/data.table/issues/637.

What you can do is convert your data.tables to data.frames, do the merge there using by.x and by.y (see ?merge.data.frame for that) and then convert back to data.tables.

Alternatively, name the keys/columns in a way they match on the name basis. Then, data.table's merge should work.

setnames(profile, "id", "profile_id")
setnames(post,    "id", "post_id")

merged_dt1 <- merge(profile, comment)
merged_dt2 <- merge(post, comment)

Upvotes: 3

Related Questions