lserlohn
lserlohn

Reputation: 6206

Efficient indexing and retrieve data from a large data frame

I have a very big data frame, with millions of rows. The data frame looks like:

id  value ......
111  1  
222  4
111  5
333  6
222  8
444  9
555  4
222  2
111  4

Every time, I want to retrieve a particular id, with all of the values. If I simply use

df[df$id == myid,]

It could be very costly, as the data frame will scan all of the ids in the table.

Is there any methods to index data frame?

Upvotes: 1

Views: 379

Answers (1)

Jonathan Carroll
Jonathan Carroll

Reputation: 3947

The data.table package is designed to work with exactly this sort of situation.

library(data.table)
dt <- as.data.table(df)
setkey(dt, id) # index the data.table by the id column

dt[myid] # extract the id==myid row

You can perform operations by reference (rather than by value) and have extremely little overhead.

Upvotes: 2

Related Questions