Reputation: 85
I have a data set with 10 rows and 5 columns. For example:
A <- c(15.0, 10.0, 5.50, 20, 22, 25, 30,
40, 50, 10.0)
B <- c(1, 30, 30, 6, 7, 10, 2, 25,
3, 27)
C <- c(1, 0, 0, 5, 15, 10, 20, 25,
30, 40)
D <- c(50, 100, 100, 500, 150, 100, 200, 250,
0, 0)
Date <- c("1997-05-01","1997-05-02","1997-05-03","1997-05-04","1997-05-05",
"1997-05-06","1997-05-07","1997-05-08","1997-05-09","1997-05-10")
data <- data.frame(A, B, C, D, Date)
Thus, I have a data table in R:
A B C D date
---- ---- ---- ---- ----
15.0 1 1 50 1997-05-01
10.0 20 0 100 1997-05-02
etc...
The range is based on quantile. For A
I wanted <
or =
to quantile 25 (e.g. 11.375), and B
to the >
or =
to quantile 75 (e.g. 23.750)
quantile(data$A, c(.25, .50, .75))
quantile(data$B, c(.25, .50, .75))
One way is to filter your data frame on those two conditions:
data[data$A <= quantile(data$A, 0.25) &
data$B >= quantile(data$B, 0.75), ]
So, I would like to create a random data (with the same amount of previous values, in this case 10 rows) from this subset of 3 rowa, for example: The new data would be:
A B C D date
---- ---- ---- ---- ----
10.0 30 0 100 1997-05-02
5.5 30 0 100 1997-05-03
10.0 27 40 0 1997-05-10
5.5 30 0 100 1997-05-03
10.0 27 40 0 1997-05-10
10.0 30 0 100 1997-05-02
10.0 27 40 0 1997-05-10
5.5 30 0 100 1997-05-03
10.0 27 40 0 1997-05-10
10.0 30 0 100 1997-05-02
how to do that best?
Thank you!
Upvotes: 0
Views: 138
Reputation: 51592
One mathematically oriented way to do it,
d3 <- data[data$A <= quantile(data$A, 0.25) &
data$B >= quantile(data$B, 0.75), ]
final_df <- rbind(d3[rep(seq_len(nrow(d3)), floor(nrow(data)/nrow(d3))),],
d3[(1: (nrow(data) - floor(nrow(data)/nrow(d3))*nrow(d3))),])
rownames(final_df) <- NULL
final_df
# A B C D Date
#1 10.0 30 0 100 1997-05-02
#2 5.5 30 0 100 1997-05-03
#3 10.0 27 40 0 1997-05-10
#4 10.0 30 0 100 1997-05-02
#5 5.5 30 0 100 1997-05-03
#6 10.0 27 40 0 1997-05-10
#7 10.0 30 0 100 1997-05-02
#8 5.5 30 0 100 1997-05-03
#9 10.0 27 40 0 1997-05-10
#10 10.0 30 0 100 1997-05-02
Upvotes: 1
Reputation: 35382
Perhaps you would like something like this?
d_filtered <- data[data$A <= quantile(data$A, 0.25) &
data$B >= quantile(data$B, 0.75), ]
d_new <- d_filtered[sample(1:nrow(d_filtered), nrow(data), replace = TRUE), ]
A B C D Date 2 10.0 30 0 100 1997-05-02 3 5.5 30 0 100 1997-05-03 3.1 5.5 30 0 100 1997-05-03 3.2 5.5 30 0 100 1997-05-03 10 10.0 27 40 0 1997-05-10 3.3 5.5 30 0 100 1997-05-03 2.1 10.0 30 0 100 1997-05-02 2.2 10.0 30 0 100 1997-05-02 10.1 10.0 27 40 0 1997-05-10 2.3 10.0 30 0 100 1997-05-02
Upvotes: 1