Reputation: 7109
How do I choose the right keys for data.table
objects?
Are the considerations similar to those for RDBMSs? My first guess was to have a look for some documentation about indexes and keys for RDBMSs. Google came up with this helpful stackoverflow question related to Oracle.
Do the considerations from that answer apply to data.tables? Perhaps with the exception of those relating to UPDATE, INSERT or DELETE type statements? I'm guessing that our data.tables
objects won't really be used in that way.
I'm trying to get my head around this stuff by using the documentation and examples, but I haven't seen any discussion on key selection.
PS: Thanks to @crayola pointing me toward the data.table
package in the first place!
Upvotes: 4
Views: 566
Reputation: 1678
I am not sure this is a very helpful answer, but since you mention me in the question I'll say what I think anyway. But remember that I am a bit of a data.table
newbie myself.
I personally only use keys when there is a clear benefit for it, e.g. merging datatables, or where it seems clear that doing so will speed things up (e.g. subsetting repeatedly on a variable).
But to my knowledge, there is sometimes no real need to define keys at all; the package is already faster than data.frame
without keys.
Upvotes: 2