Gustavo
Gustavo

Reputation: 85

Create a random data from a subset in R

I have a data set with 10 rows and 5 columns. For example:

A <- c(15.0, 10.0, 5.50, 20, 22, 25, 30, 
         40, 50, 10.0)

B <- c(1, 30, 30, 6, 7, 10, 2, 25, 
         3, 27)

C <- c(1, 0, 0, 5, 15, 10, 20, 25, 
       30, 40)

D <- c(50, 100, 100, 500, 150, 100, 200, 250, 
       0, 0)

Date <- c("1997-05-01","1997-05-02","1997-05-03","1997-05-04","1997-05-05",
            "1997-05-06","1997-05-07","1997-05-08","1997-05-09","1997-05-10")

data <- data.frame(A, B, C, D, Date)

Thus, I have a data table in R:

  A      B      C     D      date
----    ----  ----   ----    ----
15.0      1      1     50    1997-05-01
10.0     20     0     100    1997-05-02
etc...

The range is based on quantile. For A I wanted < or = to quantile 25 (e.g. 11.375), and B to the > or = to quantile 75 (e.g. 23.750)

quantile(data$A, c(.25, .50, .75))

quantile(data$B, c(.25, .50, .75))

One way is to filter your data frame on those two conditions:

data[data$A <= quantile(data$A, 0.25) &
        data$B >= quantile(data$B, 0.75), ]

So, I would like to create a random data (with the same amount of previous values, in this case 10 rows) from this subset of 3 rowa, for example: The new data would be:

  A      B      C     D      date
----    ----  ----   ----    ----
10.0     30     0     100    1997-05-02
5.5      30     0     100    1997-05-03
10.0     27     40     0     1997-05-10
5.5      30     0     100    1997-05-03
10.0     27     40     0     1997-05-10 
10.0     30     0     100    1997-05-02
10.0     27     40     0     1997-05-10
5.5      30     0     100    1997-05-03
10.0     27     40     0     1997-05-10
10.0     30     0     100    1997-05-02

how to do that best?

Thank you!

Upvotes: 0

Views: 138

Answers (2)

Sotos
Sotos

Reputation: 51592

One mathematically oriented way to do it,

d3 <- data[data$A <= quantile(data$A, 0.25) &
           data$B >= quantile(data$B, 0.75), ]

final_df <- rbind(d3[rep(seq_len(nrow(d3)), floor(nrow(data)/nrow(d3))),], 
                  d3[(1: (nrow(data) - floor(nrow(data)/nrow(d3))*nrow(d3))),])
rownames(final_df) <- NULL
final_df
#      A  B  C   D       Date
#1  10.0 30  0 100 1997-05-02
#2   5.5 30  0 100 1997-05-03
#3  10.0 27 40   0 1997-05-10
#4  10.0 30  0 100 1997-05-02
#5   5.5 30  0 100 1997-05-03
#6  10.0 27 40   0 1997-05-10
#7  10.0 30  0 100 1997-05-02
#8   5.5 30  0 100 1997-05-03
#9  10.0 27 40   0 1997-05-10
#10 10.0 30  0 100 1997-05-02

Upvotes: 1

Axeman
Axeman

Reputation: 35382

Perhaps you would like something like this?

d_filtered <- data[data$A <= quantile(data$A, 0.25) &
                     data$B >= quantile(data$B, 0.75), ]
d_new <- d_filtered[sample(1:nrow(d_filtered), nrow(data), replace = TRUE), ]
       A  B  C   D       Date
2    10.0 30  0 100 1997-05-02
3     5.5 30  0 100 1997-05-03
3.1   5.5 30  0 100 1997-05-03
3.2   5.5 30  0 100 1997-05-03
10   10.0 27 40   0 1997-05-10
3.3   5.5 30  0 100 1997-05-03
2.1  10.0 30  0 100 1997-05-02
2.2  10.0 30  0 100 1997-05-02
10.1 10.0 27 40   0 1997-05-10
2.3  10.0 30  0 100 1997-05-02

Upvotes: 1

Related Questions