Reputation: 79
I have a data frame with thousands of rows and 3 columns: value, experiment and ratio. Value contains values (both positive and negative); experiment the experiment number (either E1, E2 or E3), and ratio contains one of three terms (X.Y, Y.Z or Z.X).
I need for each of the three ratios, extract all columns for the 50 values closest to 0, bearing in mind that this is very likely to be a mixture of positive and negative values.
The only (naive) way I can think of is to subset/extract the data for each ratio, then sort (order) it based on value, and subset again to get the 25 negative values closest to 0 and 25 positive values closest to 0.
Any better way?
Upvotes: 0
Views: 1106
Reputation: 121568
A data.table solution in case you have many rows:
set.seed(1)
N <- 1e6
library(data.table)
dat <- data.table( value = runif(N,-100,100),
experiment = sample(paste0('E',1:3),N,rep=T),
ratio= sample(c('X.Y', 'Y.Z','Z.X'),N,rep=T))
dat[,{id <- order(abs(value))[1:50]
list(value=value[id],
experiment=experiment[id])
} ,by='ratio']
Upvotes: 2
Reputation: 9687
My solution uses by to order and :
by(df, df$RATIO, function(x) x[ order(abs(x$VALUE))[1:50] , ] )
This will return a list, each element containing one subset.
Upvotes: 3