lovetl2002
lovetl2002

Reputation: 1068

How to create a python-style dictionary by data.table in R?

I'm looking for a python-like dictionary structure in R to replace values in a large dataset (>100 MB) and I think data.table package can help me do this. However, I cannot find out an easy way to solve the problem.

For example, I have two data.table:

Table A:

   V1 V2
1:  A  B
2:  C  D
3:  C  D
4:  B  C
5:  D  A

Table B:

   V3 V4
1:  A  1
2:  B  2
3:  C  3
4:  D  4

I want to use B as a dictionary to replace the values in A. So the result I want to get is:

Table R:

V5 V6
 1  2
 3  4
 3  4
 2  3
 4  1

What I did is:

c2=tB[tA[,list(V2)],list(V4)]
c1=tB[tA[,list(V1)],list(V4)]

Although I specified j=list(V4), it still returned me with the values of V3. I don't know why.

c2:

   V3 V4
1:  B  2
2:  D  4
3:  D  4
4:  C  3
5:  A  1

c1:

   V3 V4
1:  A  1
2:  C  3
3:  C  3
4:  B  2
5:  D  4

Finally, I combined the two V4 columns and got the result I want.

But I think there should be a much easier way to do this. Any ideas?

Upvotes: 4

Views: 1408

Answers (2)

Arun
Arun

Reputation: 118839

Here's an alternative way:

setkey(B, V3)
for (i in seq_len(length(A))) {
    thisA = A[[i]]
    set(A, j=i, value=B[thisA]$V4)
}
#    V1 V2
# 1:  1  2
# 2:  3  4
# 3:  3  4
# 4:  2  3
# 5:  4  1

Since thisA is character column, we don't need the J() (for convenience). Here, A's columns are replaced by reference, and is therefore also memory efficient. But if you don't want to replace A, then you can just use cA <- copy(A) and replace cA's columns.


Alternatively, using :=:

A[, names(A) := lapply(.SD, function(x) B[J(x)]$V4)]
# or
ans = copy(A)[, names(A) := lapply(.SD, function(x) B[J(x)]$V4)]

(Following user2923419's comment): You can drop the J() if the lookup is a single column of type character (just for convenience).


In 1.9.3, when j is a single column, it returns a vector (based on user request). So, it's a bit more natural data.table syntax:

setkey(B, V3)
for (i in seq_len(length(A))) {
    thisA = A[[i]]
    set(A, j=i, value=B[thisA, V4])
}

Upvotes: 3

Mike.Gahan
Mike.Gahan

Reputation: 4615

I am not sure how fast this is with big data, but chmatch is supposed to be fast.

tA[ , lapply(.SD,function(x) tB$V4[chmatch(x,tB$V3)])]

   V1 V2
1:  1  2
2:  3  4
3:  3  4
4:  2  3
5:  4  1

Upvotes: 0

Related Questions