data.table and hash -- speed and flexibility to handle multiple values per key

Question

I have 2 questions:

Is hash faster than data.table for Big Data?
How can I deal with multiple values per key, if I want to use a hash-based approach?

I looked at the vignette of the related packages and Googled some potential solutions, but I'm still not sure about the answers to the questions above.

Considering the following post,

R fast single item lookup from list vs data.table vs hash

it seems that a single lookup in a data.table object is actually quite slow, even slower than in a list in Base R?

However a lookup using a hash object from hash is very speedy, based on this benchmark -- is that accurate?

However, it looks like the object hash is handling only unique keys?

In the following only 2 (key,value) pairs are created.

library(hash)
> h <- hash(c("A","B","A"),c(1,2,3))
> h
 containing 2 key-value pair(s).
  A : 3
  B : 2

So, if i have a table with (key,values) where a key can have different values, and i want to do a (quick) lookup for the values corresponding to this key, what is the best object/data structure in R to do that ?

Can we still use the hash object or is data.table the most appropriate in this case ?

Let's say we are in the context of dealing a problem with very large tables, otherwise this discussion is irrelevant.

Related link: http://www.r-bloggers.com/hash-table-performance-in-r-part-i/

data.table and hash -- speed and flexibility to handle multiple values per key

Answers (1)

Related Questions