How can I select n samples out of m equally spreaded within a range?

Question

Lets say I want to keep a number of n backups from a set of m backups in a month of d days.

For example: I have 30 days (d=30), I made a backup everyday, so m=30, and I want to keep 3 as spreaded as possible n=3.

So my input with this paramters would be for january 2022:

And the output for this scenario:

20220101
20220115
20220130

But I want this to be dynamically set through n,m,d. Because I may have less than 30 backups per month not correctly spreaded (for example just from 1st to 15th, or one every 3 days), and I still want to keep 3 or n as spreaded as possible.

I have been thinking about this for a while and I think the algorithm should be something simple but I still I´m not able to get it. I would like to do it in bash or perl, but just getting the algorithm would be more than enough help.

James Brown · Accepted Answer

Using awk. Try setting the -v n=3 to other values:

$ awk -v n=3 '
{
    a[NR]=$0                                   # store dates to an array, ordered
}                                              # prebuilt count of values NR is m
END {                                          # after all values were stored
    print a[1]                                 # print the first value
    for(i=1;i<=(n-1);i++)                      # loop
        print a[(y=int(x=NR*i/(n-1)))


Output:
20220101
20220115
20220130

Updated: The bare int() in print a[int()] was not enough so I replaced it with ceil()ish implementation (an improvisation of this solution). Now it gives better result to @GerardH.Pille's sample in the comments (thanks for pointing it out):
20220101
20220103  # this was 20220102 previously
20220131

but as this small solution does not rely on the content of the data, but only the positions of the content hence relying on even spread of values, it can't perfectly produce the optimal output.

How can I select n samples out of m equally spreaded within a range?

Answers (1)

Related Questions