Reputation: 1208
How to sample a subsample of large data.table (data.table
package)? Is there more elegant way to perform the following
DT<- data.table(cbind(site = rep(letters[1:2], 1000), value = runif(2000)))
DT[site=="a"][sample(1:nrow(DT[site=="a"]), 100)]
Guess there is a simple solution, but can't choose the right wording to search for.
UPDATE:
More generally, how can I access a row number in data.table's i
argument without creating temporary column for row number?
Upvotes: 5
Views: 3498
Reputation: 55420
One of the biggest benefits of using data.table
is that you can set a key for your data.
Using the key
and then .I
(a built in vairable. see ?data.table
for more info) you can use:
setkey(DT, site)
DT[DT["a", sample(.I, 100)]]
As for your second question "how can I access a row number in data.table's i argument"
# Just use the number directly:
DT[17]
Upvotes: 5
Reputation: 22353
Using which
, you can find the row-numbers. Instead of sampling from 1:nrow(...)
you can simply sample from all rows with the desired property. In your example, you can use the following:
DT[sample(which(site=="a"), 100)]
Upvotes: 4