Kvothe
Kvothe

Reputation: 79

Data subsetting in R

I have a data frame with thousands of rows and 3 columns: value, experiment and ratio. Value contains values (both positive and negative); experiment the experiment number (either E1, E2 or E3), and ratio contains one of three terms (X.Y, Y.Z or Z.X).

I need for each of the three ratios, extract all columns for the 50 values closest to 0, bearing in mind that this is very likely to be a mixture of positive and negative values.

The only (naive) way I can think of is to subset/extract the data for each ratio, then sort (order) it based on value, and subset again to get the 25 negative values closest to 0 and 25 positive values closest to 0.

Any better way?

Upvotes: 0

Views: 1106

Answers (2)

agstudy
agstudy

Reputation: 121568

A data.table solution in case you have many rows:

set.seed(1)
N <- 1e6
library(data.table)
dat <- data.table( value = runif(N,-100,100),
                   experiment = sample(paste0('E',1:3),N,rep=T),
                   ratio= sample(c('X.Y', 'Y.Z','Z.X'),N,rep=T))

dat[,{id <- order(abs(value))[1:50]
      list(value=value[id],
           experiment=experiment[id])
      } ,by='ratio']

Upvotes: 2

Neal Fultz
Neal Fultz

Reputation: 9687

My solution uses by to order and :

by(df, df$RATIO, function(x) x[ order(abs(x$VALUE))[1:50] , ] )

This will return a list, each element containing one subset.

Upvotes: 3

Related Questions