Reputation: 1068
I'm looking for a python-like dictionary structure in R to replace values in a large dataset (>100 MB) and I think data.table package can help me do this. However, I cannot find out an easy way to solve the problem.
For example, I have two data.table:
Table A:
V1 V2
1: A B
2: C D
3: C D
4: B C
5: D A
Table B:
V3 V4
1: A 1
2: B 2
3: C 3
4: D 4
I want to use B as a dictionary to replace the values in A. So the result I want to get is:
Table R:
V5 V6
1 2
3 4
3 4
2 3
4 1
What I did is:
c2=tB[tA[,list(V2)],list(V4)]
c1=tB[tA[,list(V1)],list(V4)]
Although I specified j=list(V4), it still returned me with the values of V3. I don't know why.
c2:
V3 V4
1: B 2
2: D 4
3: D 4
4: C 3
5: A 1
c1:
V3 V4
1: A 1
2: C 3
3: C 3
4: B 2
5: D 4
Finally, I combined the two V4
columns and got the result I want.
But I think there should be a much easier way to do this. Any ideas?
Upvotes: 4
Views: 1408
Reputation: 118839
Here's an alternative way:
setkey(B, V3)
for (i in seq_len(length(A))) {
thisA = A[[i]]
set(A, j=i, value=B[thisA]$V4)
}
# V1 V2
# 1: 1 2
# 2: 3 4
# 3: 3 4
# 4: 2 3
# 5: 4 1
Since thisA
is character column, we don't need the J()
(for convenience). Here, A
's columns are replaced by reference, and is therefore also memory efficient. But if you don't want to replace A
, then you can just use cA <- copy(A)
and replace cA
's columns.
Alternatively, using :=
:
A[, names(A) := lapply(.SD, function(x) B[J(x)]$V4)]
# or
ans = copy(A)[, names(A) := lapply(.SD, function(x) B[J(x)]$V4)]
(Following user2923419's comment): You can drop the J()
if the lookup is a single column of type character (just for convenience).
In 1.9.3, when j
is a single column, it returns a vector (based on user request). So, it's a bit more natural data.table syntax:
setkey(B, V3)
for (i in seq_len(length(A))) {
thisA = A[[i]]
set(A, j=i, value=B[thisA, V4])
}
Upvotes: 3
Reputation: 4615
I am not sure how fast this is with big data, but chmatch
is supposed to be fast.
tA[ , lapply(.SD,function(x) tB$V4[chmatch(x,tB$V3)])]
V1 V2
1: 1 2
2: 3 4
3: 3 4
4: 2 3
5: 4 1
Upvotes: 0