Reputation: 6206
I have a very big data frame, with millions of rows. The data frame looks like:
id value ......
111 1
222 4
111 5
333 6
222 8
444 9
555 4
222 2
111 4
Every time, I want to retrieve a particular id, with all of the values. If I simply use
df[df$id == myid,]
It could be very costly, as the data frame will scan all of the ids in the table.
Is there any methods to index data frame?
Upvotes: 1
Views: 379
Reputation: 3947
The data.table
package is designed to work with exactly this sort of situation.
library(data.table)
dt <- as.data.table(df)
setkey(dt, id) # index the data.table by the id column
dt[myid] # extract the id==myid row
You can perform operations by reference (rather than by value) and have extremely little overhead.
Upvotes: 2