Gil33
Gil33

Reputation: 123

Selecting n random rows across arbitrary intervals

I need to select random rows from different numeric intervals I´ve stipulated. The following topic is very related, but in this case the rows were selected from levels:

selecting n random rows across all levels of a factor within a dataframe

Using the same sample example:

df <- data.frame(matrix(rnorm(80), nrow=40))
df$color <-  rep(c("blue", "red", "yellow", "pink"), each=10)'

How could I select 4 rows (or any other n) from where -1< X1<0 and 4 rows where 0 ≤ X1<2?

Upvotes: 0

Views: 49

Answers (3)

akrun
akrun

Reputation: 886938

Try this

 n <- 4
 indx1 <- with(df, which(X1>-1 & X1 <0))
 indx2 <- with(df, which(X1>=0 & X1 <2))
 df[sample(indx1,n,replace=FALSE),]
 df[sample(indx2,n,replace=FALSE),]

Update

If you need to select a sample of 'n' rows per each grouping variable 'color' based on the condition in 'X1' variable

library(data.table)#v1.9.5+
setDT(df)[between(X1, -1,0), if(n > .N) .SD  else 
           .SD[sample(.N, n, replace=FALSE)] , by = color]

You can use the second condition for "X1" similarly

Upvotes: 2

JasonAizkalns
JasonAizkalns

Reputation: 20463

With dplyr:

library(dplyr)

df %>%
  filter(X1 > -1 & X1 < 0) %>%
  sample_n(4)


df %>%
  filter(X1 >= 0 & X1 < 2) %>%
  sample_n(4)

You could abstract the number to select by doing something like this:

num_to_select <- 4

df %>%
  filter(X1 > -1 & X1 < 0) %>%
  sample_n(num_to_select)

Likewise, you could do the same with the lower and upper cutoffs:

num_to_select <- 4
lower_cutoff  <- -1
upper_cutoff  <- 0

df %>%
  filter(X1 > lower_cutoff & X1 < upper_cutoff) %>%
  sample_n(num_to_select)

Upvotes: 0

Alexey Ferapontov
Alexey Ferapontov

Reputation: 5169

set.seed(1234)
df <- data.frame(matrix(rnorm(80), nrow=40))
df$color <-  rep(c("blue", "red", "yellow", "pink"), each=10)

s1 =  subset(df,df$X1<0 & df$X1 > -1)
s2 =  subset(df,df$X1<2 & df$X1 >= 0)

r1 = s1[sample(nrow(s1), 4), ]
r2 = s2[sample(nrow(s2), 4), ]

> r1
           X1         X2  color
18 -0.9111954 -0.7733534    red
22 -0.4906859  2.5489911 yellow
17 -0.5110095  1.6478175    red
11 -0.4771927 -1.8060313    red
> r2
          X1           X2 color
2  0.2774292 -1.068642724  blue
15 0.9594941 -0.162309524   red
6  0.5060559 -0.968514318  blue
31 1.1022975  0.006892838  pink

Upvotes: 0

Related Questions