Reputation: 14604
I have a data table:
> COUNT_ID_CATEGORY
id 706 799 1703 1726 2119 2202 3203 3504 3509 4401 4517 5122 5558 5616 5619 5824 6202 7205 9115 9909
1: 86246 9 0 15 4 28 0 15 63 39 5 7 25 27 43 12 64 1 16 0 96
2: 86252 3 0 17 6 21 0 6 62 24 6 7 12 25 32 6 49 1 26 0 103
3: 12262064 3 0 1 1 12 0 0 2 1 0 0 0 2 4 0 4 0 0 0 12
4: 12277270 2 0 0 0 1 0 3 0 3 0 0 0 0 24 0 6 2 5 0 60
5: 12332190 2 0 2 0 4 0 1 2 0 0 0 1 0 3 0 1 3 2 0 46
---
310661: 4837642552 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 1 0
310662: 4843417324 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 2 0 0 0 0
310663: 4847628950 2 0 1 1 16 0 0 2 3 0 0 2 9 5 0 3 3 2 3 14
310664: 4847787712 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0
310665: 4853598737 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0
> class(COUNT_ID_CATEGORY)
[1] "data.table" "data.frame"
>
and I wish to read the data as quickly as possible as follows:
COUNT_ID_CATEGORY for (id == 86246) & (category == 706)
which should return the value 9 (top left in the table). (for example)
I can get the row with:
COUNT_ID_CATEGORY[id==86246,]
but how do I get the column?
> dput(head(COUNT_ID_CATEGORY))
structure(list(id = c(86246, 86252, 12262064, 12277270, 12332190,
12524696), `706` = c(9L, 3L, 3L, 2L, 2L, 0L), `799` = c(0L, 0L,
0L, 0L, 0L, 0L), `1703` = c(15L, 17L, 1L, 0L, 2L, 0L), `1726` = c(4L,
6L, 1L, 0L, 0L, 0L), `2119` = c(28L, 21L, 12L, 1L, 4L, 0L), `2202` = c(0L,
0L, 0L, 0L, 0L, 0L), `3203` = c(15L, 6L, 0L, 3L, 1L, 0L), `3504` = c(63L,
62L, 2L, 0L, 2L, 11L), `3509` = c(39L, 24L, 1L, 3L, 0L, 3L),
`4401` = c(5L, 6L, 0L, 0L, 0L, 1L), `4517` = c(7L, 7L, 0L,
0L, 0L, 1L), `5122` = c(25L, 12L, 0L, 0L, 1L, 0L), `5558` = c(27L,
25L, 2L, 0L, 0L, 1L), `5616` = c(43L, 32L, 4L, 24L, 3L, 18L
), `5619` = c(12L, 6L, 0L, 0L, 0L, 0L), `5824` = c(64L, 49L,
4L, 6L, 1L, 10L), `6202` = c(1L, 1L, 0L, 2L, 3L, 6L), `7205` = c(16L,
26L, 0L, 5L, 2L, 4L), `9115` = c(0L, 0L, 0L, 0L, 0L, 0L),
`9909` = c(96L, 103L, 12L, 60L, 46L, 1L)), .Names = c("id",
"706", "799", "1703", "1726", "2119", "2202", "3203", "3504",
"3509", "4401", "4517", "5122", "5558", "5616", "5619", "5824",
"6202", "7205", "9115", "9909"), sorted = "id", class = c("data.table",
"data.frame"), row.names = c(NA, -6L), .internal.selfref = <pointer: 0x043a24a0>)
Upvotes: 2
Views: 2161
Reputation: 3627
First setkey for fast lookup using data.table
's binary search/subset feature:
setkey(COUNT_ID_CATEGORY, id)
Then you can do:
COUNT_ID_CATEGORY[J(86246)][, '706']
The first part COUNT_ID_CATEGORY[J(86246)]
performs fast subset using binary search. You can read more about J(.)
and what it does here.
The next part [, '706', with=FALSE]
takes the subset result, which is a data.table
and selects just the column 706
.
Just to be complete, this post shows more ways of selecting/subsetting columns from a data.table
.
Upvotes: 3