Curtis
Curtis

Reputation: 3449

Algorithm for grouping train trips

Imagine you have a full calendar year in front of you. On some days you take the train, potentially even a few times in a single day and each trip could be to a different location (I.E. The amount you pay for the ticket can be different for each trip).

So you would have data that looked like this:

Date: 2018-01-01, Amount: $5
Date: 2018-01-01, Amount: $6
Date: 2018-01-04, Amount: $2
Date: 2018-01-06, Amount: $4
...

Now you have to group this data into buckets. A bucket can span up to 31 consecutive days (no gaps) and cannot overlap another bucket.

If a bucket has less than 32 train trips it will be blue. If it has 32 or more train trips in it, it will be red. The buckets will also get a value based on the sum of the ticket cost.

After you group all the trips the blue buckets get thrown out. And the value of all the red buckets gets summed up, we will call this the prize.

The goal, is to get the highest value for the prize.

This is the problem I have. I cant think of a good algorithm to do this. If anyone knows a good way to approach this I would like to hear it. Or if you know of anywhere else that can help with designing algorithms like this.

Upvotes: 2

Views: 239

Answers (2)

Gassa
Gassa

Reputation: 8846

This can be solved by dynamic programming.

First, sort the records by date, and consider them in that order. Let day (1), day (2), ..., day (n) be the days where the tickets were bought. Let cost (1), cost (2), ..., cost (n) be the respective ticket costs.

Let fun (k) be the best prize if we consider only the first k records. Our dynamic programming solution will calculate fun (0), fun (1), fun (2), ..., fun (n-1), fun (n), using the previous values to calculate the next one.

Base: fun (0) = 0.

Transition: What is the optimal solution, fun (k), if we consider only the first k records? There are two possibilities: either the k-th record is dropped, then the solution is the same as fun (k-1), or the k-th record is the last record of a bucket. Let us then consider all possible buckets ending with the k-th record in a loop, as explained below.

Look at records k, k-1, k-2, ..., down to the very first record. Let the current index be i. If the records from i to k span more than 31 consecutive days, break from the loop. Otherwise, if the number of records, k-i+1, is at least 32, we can solve the subproblem fun (i-1) and then add the records from i to k, getting a prize of cost (i) + cost (i+1) + ... + cost (k). The value fun (k) is the maximum of these possibilities, along with the possibility to drop the k-th record.

Answer: it is just fun (n), the case where we considered all the records.


In pseudocode:

fun[0] = 0
for k = 1, 2, ..., n:
    fun[k] = fun[k-1]
    cost_i_to_k = 0
    for i = k, k-1, ..., 1:
        if day[k] - day[i] > 31:
            break
        cost_i_to_k += cost[i]
        if k-i+1 >= 32:
            fun[k] = max (fun[k], fun[i-1] + cost_i_to_k)
return fun[n]

It is not clear whether we are allowed to split records on a single day into different buckets. If the answer is no, we will have to enforce it by not considering buckets starting or ending between records in a single day. Technically, it can be done by a couple of if statements. Another way is to consider days instead of records: instead of tickets which have day and cost, we will work with days. Each day will have cost, the total cost of tickets on that day, and quantity, the number of tickets.


Edit: as per comment, we indeed can not split any single day. Then, after some preprocessing to get days records instead of tickets records, we can go as follows, in pseudocode:

fun[0] = 0
for k = 1, 2, ..., n:
    fun[k] = fun[k-1]
    cost_i_to_k = 0
    quantity_i_to_k = 0
    for i = k, k-1, ..., 1:
        if k-i+1 > 31:
            break
        cost_i_to_k += cost[i]
        quantity_i_to_k += quantity[i]
        if quantity_i_to_k >= 32:
            fun[k] = max (fun[k], fun[i-1] + cost_i_to_k)
return fun[n]

Here, i and k are numbers of days. Note that we consider all possible days in the range: if there are no tickets for a particular day, we just use zeroes as its cost and quantity values.


Edit2: The above allows us to calculate the maximum total prize, but what about the actual configuration of buckets which gets us there? The general method will be backtracking: at position k, we will want to know how we got fun (k), and transition to either k-1 if the optimal way was to skip k-th record, or from k to i-1 for such i that the equation fun[k] = fun[i-1] + cost_i_to_k holds. We proceed until i goes down to zero.

One of the two usual implementation approaches is to store par (k), a "parent", along with fun (k), which encodes how exactly we got the maximum. Say, if par (k) = -1, the optimal solution skips k-th record. Otherwise, we store the optimal index i in par (k), so that the optimal solution takes a bucket of records i to k inclusive.

The other approach is to store nothing extra. Rather, we run a slight modification code which calculates fun (k). But instead of assigning things to fun (k), we compare the right part of the assignment to the final value fun (k) we already got. As soon as they are equal, we found the right transition.


In pseudocode, using the second approach, and days instead of individual records:

k = n
while k > 0:
    k = prev (k)

function prev (k):
    if fun[k] == fun[k-1]:
         return k-1
    cost_i_to_k = 0
    quantity_i_to_k = 0
    for i = k, k-1, ..., 1:
        if k-i+1 > 31:
            break
        cost_i_to_k += cost[i]
        quantity_i_to_k += quantity[i]
        if quantity_i_to_k >= 32:
            if fun[k] == fun[i-1] + cost_i_to_k:
                writeln ("bucket from $ to $: cost $, quantity $",
                    i, k, cost_i_to_k, quantity_i_to_k)
                return i-1
    assert (false, "can't happen")

Upvotes: 2

user unknown
user unknown

Reputation: 36229

Simplify the challenge, but not too much, to make an overlookable example, which can be solved by hand.

That helps a lot in finding the right questions.

For example take only 10 days, and buckets of maximum length of 3:

symbolic representation of tickets per day

For building buckets and colorizing them, we need only the ticket count, here 0, 1, 2, 3.

On Average, we need more than one bucket per day, for example 2-0-2 is 4 tickets in 3 days. Or 1-1-3, 1-3, 1-3-1, 3-1-2, 1-2.

But We can only choose 2 red buckets: 2-0-2 and (1-1-3 or 1-3-3 or 3-1-2) since 1-2 in the end is only 3 tickets, but we need at least 4 (one more ticket than max day span per bucket).

But while 3-1-2 is obviously more tickets than 1-1-3 tickets, the value of less tickets might be higher.

The blue colored area is the less interesting one, because it doesn't feed itself, by ticket count.

Upvotes: 1

Related Questions