Reputation: 395
I have a data set similar to the following with 1 column and 60 rows:
value
1 0.0423
2 0.0388
3 0.0386
4 0.0342
5 0.0296
6 0.0276
7 0.0246
8 0.0239
9 0.0234
10 0.0214
.
40 0.1424
.
60 -0.0312
I want to reorder the rows so that certain conditions are met. For example one condition could be: sum(df$value[4:7]) > 0.1000
& sum(df$value[4:7]) <0.1100
With the data set looking like this for example.
value
1 0.0423
2 0.0388
3 0.0386
4 0.1312
5 -0.0312
6 0.0276
7 0.0246
8 0.0239
9 0.0234
10 0.0214
.
.
.
60 0.0342
What I tried was using repeat
and sample
as in the following:
repeat{
df1 <- as_tibble(sample(sdf$value, replace = TRUE))
if (sum(df$value[4:7]) > 0.1000 & sum(df$value[4:7]) <0.1100) break
}
Unfortunately, this method takes quite some time and I was wondering if there is a faster way to reorder rows based on mathematical conditions such as sum
or prod
Upvotes: 0
Views: 413
Reputation: 60060
Here's a quick implementation of the hill-climbing method I outlined in my comment. I've had to slightly reframe the desired condition as "distance of sum(x[4:7])
from 0.105" to make it continuous, although you can still use the exact condition when doing the check that all requirements are satisfied. The benefit is that you can add extra conditions to the distance function easily.
# Using same example data as Jon Spring
set.seed(42)
vs = rnorm(60, 0.05, 0.08)
get_distance = function(x) {
distance = abs(sum(x[4:7]) - 0.105)
# Add to the distance with further conditions if needed
distance
}
max_attempts = 10000
best_distance = Inf
swaps_made = 0
for (step in 1:max_attempts) {
# Copy the vector and swap two random values
new_vs = vs
swap_inds = sample.int(length(vs), 2, replace = FALSE)
new_vs[swap_inds] = rev(new_vs[swap_inds])
# Keep the new vector if the distance has improved
new_distance = get_distance(new_vs)
if (new_distance < best_distance) {
vs = new_vs
best_distance = new_distance
swaps_made = swaps_made + 1
}
complete = (sum(vs[4:7]) < 0.11) & (sum(vs[4:7]) > 0.1)
if (complete) {
print(paste0("Solution found in ", step, " steps"))
break
}
}
sum(vs[4:7])
There's no real guarantee that this method will reach a solution, but I often try this kind of basic hill-climbing when I'm not sure if there's a "smart" way to approach a problem.
Upvotes: 2
Reputation: 66415
Here's an approach using combn
from base R, and then filtering using dplyr
. (I'm sure there's a way w/o it but my base-fu isn't there yet.)
With only 4 numbers from a pool of 60, there are "only" 488k different combinations (ignoring order; =60*59*58*57/4/3/2), so it's quick to brute force in about a second.
# Make a vector of 60 numbers like your example
set.seed(42)
my_nums <- rnorm(60, 0.05, 0.08);
all_combos <- combn(my_nums, 4) # Get all unique combos of 4 numbers
library(tidyverse)
combos_table <- all_combos %>%
t() %>%
as_tibble() %>%
mutate(sum = V1 + V2 + V3 + V4) %>%
filter(sum > 0.1, sum < 0.11)
> combos_table
# A tibble: 8,989 x 5
V1 V2 V3 V4 sum
<dbl> <dbl> <dbl> <dbl> <dbl>
1 0.160 0.00482 0.0791 -0.143 0.100
2 0.160 0.00482 0.101 -0.163 0.103
3 0.160 0.00482 0.0823 -0.145 0.102
4 0.160 0.00482 0.0823 -0.143 0.104
5 0.160 0.00482 -0.0611 -0.00120 0.102
6 0.160 0.00482 -0.0611 0.00129 0.105
7 0.160 0.00482 0.0277 -0.0911 0.101
8 0.160 0.00482 0.0277 -0.0874 0.105
9 0.160 0.00482 0.101 -0.163 0.103
10 0.160 0.00482 0.0273 -0.0911 0.101
# … with 8,979 more rows
This says that in this example, there are about 9000 different sets of 4 numbers from my sequence which meet the criteria. We could pick any of these and put them in positions 4-7 to meet your requirement.
Upvotes: 1