Algorithm for grouping train trips

Question

Imagine you have a full calendar year in front of you. On some days you take the train, potentially even a few times in a single day and each trip could be to a different location (I.E. The amount you pay for the ticket can be different for each trip).

So you would have data that looked like this:
Date: 2018-01-01, Amount: $5
Date: 2018-01-01, Amount: $6
Date: 2018-01-04, Amount: $2
Date: 2018-01-06, Amount: $4
...
Now you have to group this data into buckets. A bucket can span up to 31 consecutive days (no gaps) and cannot overlap another bucket.

If a bucket has less than 32 train trips it will be blue. If it has 32 or more train trips in it, it will be red. The buckets will also get a value based on the sum of the ticket cost.

After you group all the trips the blue buckets get thrown out. And the value of all the red buckets gets summed up, we will call this the prize.

The goal, is to get the highest value for the prize.

This is the problem I have. I cant think of a good algorithm to do this. If anyone knows a good way to approach this I would like to hear it. Or if you know of anywhere else that can help with designing algorithms like this.

Gassa · Accepted Answer

This can be solved by dynamic programming.

First, sort the records by date, and consider them in that order. Let day (1), day (2), ..., day (n) be the days where the tickets were bought. Let cost (1), cost (2), ..., cost (n) be the respective ticket costs.

Let fun (k) be the best prize if we consider only the first k records. Our dynamic programming solution will calculate fun (0), fun (1), fun (2), ..., fun (n-1), fun (n), using the previous values to calculate the next one.

Base: fun (0) = 0.

Transition: What is the optimal solution, fun (k), if we consider only the first k records? There are two possibilities: either the k-th record is dropped, then the solution is the same as fun (k-1), or the k-th record is the last record of a bucket. Let us then consider all possible buckets ending with the k-th record in a loop, as explained below.

Look at records k, k-1, k-2, ..., down to the very first record. Let the current index be i. If the records from i to k span more than 31 consecutive days, break from the loop. Otherwise, if the number of records, k-i+1, is at least 32, we can solve the subproblem fun (i-1) and then add the records from i to k, getting a prize of cost (i) + cost (i+1) + ... + cost (k). The value fun (k) is the maximum of these possibilities, along with the possibility to drop the k-th record.

Answer: it is just fun (n), the case where we considered all the records.

In pseudocode:

fun[0] = 0
for k = 1, 2, ..., n:
    fun[k] = fun[k-1]
    cost_i_to_k = 0
    for i = k, k-1, ..., 1:
        if day[k] - day[i] > 31:
            break
        cost_i_to_k += cost[i]
        if k-i+1 >= 32:
            fun[k] = max (fun[k], fun[i-1] + cost_i_to_k)
return fun[n]

It is not clear whether we are allowed to split records on a single day into different buckets. If the answer is no, we will have to enforce it by not considering buckets starting or ending between records in a single day. Technically, it can be done by a couple of if statements. Another way is to consider days instead of records: instead of tickets which have day and cost, we will work with days. Each day will have cost, the total cost of tickets on that day, and quantity, the number of tickets.

Edit: as per comment, we indeed can not split any single day. Then, after some preprocessing to get days records instead of tickets records, we can go as follows, in pseudocode:

fun[0] = 0
for k = 1, 2, ..., n:
    fun[k] = fun[k-1]
    cost_i_to_k = 0
    quantity_i_to_k = 0
    for i = k, k-1, ..., 1:
        if k-i+1 > 31:
            break
        cost_i_to_k += cost[i]
        quantity_i_to_k += quantity[i]
        if quantity_i_to_k >= 32:
            fun[k] = max (fun[k], fun[i-1] + cost_i_to_k)
return fun[n]

Here, i and k are numbers of days. Note that we consider all possible days in the range: if there are no tickets for a particular day, we just use zeroes as its cost and quantity values.

Edit2: The above allows us to calculate the maximum total prize, but what about the actual configuration of buckets which gets us there? The general method will be backtracking: at position k, we will want to know how we got fun (k), and transition to either k-1 if the optimal way was to skip k-th record, or from k to i-1 for such i that the equation fun[k] = fun[i-1] + cost_i_to_k holds. We proceed until i goes down to zero.

One of the two usual implementation approaches is to store par (k), a "parent", along with fun (k), which encodes how exactly we got the maximum. Say, if par (k) = -1, the optimal solution skips k-th record. Otherwise, we store the optimal index i in par (k), so that the optimal solution takes a bucket of records i to k inclusive.

The other approach is to store nothing extra. Rather, we run a slight modification code which calculates fun (k). But instead of assigning things to fun (k), we compare the right part of the assignment to the final value fun (k) we already got. As soon as they are equal, we found the right transition.

In pseudocode, using the second approach, and days instead of individual records:

k = n
while k > 0:
    k = prev (k)

function prev (k):
    if fun[k] == fun[k-1]:
         return k-1
    cost_i_to_k = 0
    quantity_i_to_k = 0
    for i = k, k-1, ..., 1:
        if k-i+1 > 31:
            break
        cost_i_to_k += cost[i]
        quantity_i_to_k += quantity[i]
        if quantity_i_to_k >= 32:
            if fun[k] == fun[i-1] + cost_i_to_k:
                writeln ("bucket from $ to $: cost $, quantity $",
                    i, k, cost_i_to_k, quantity_i_to_k)
                return i-1
    assert (false, "can't happen")

Algorithm for grouping train trips

Answers (2)

Related Questions