Andersnk
Andersnk

Reputation: 867

Project Euler Problem #18 Python - getting the wrong result. Why?

I'm trying to solve the Euler Projects as an exercise to learn Python, for the last few days after work, and am now at Problem 18

I looked at the problem, and thought it could be solved by using Dijkstra's algorithm, with the values of the nodes as negative integers, so as to find the "longest" path.

My solution seems to be almost correct (I get 1068) - which is to say wrong. It prints a path, but from what I can tell, it's not the right one. But having looked at it from some time, I can't tell why.

Perhaps this problem cannot be solved by my approach, and I need some other approach, like dynamic programming - or maybe my implementation of Dijkstra is faulty?

I'm pretty confident the parsing from file to graph is working as intended.

This is the data-set:

75
95 64
17 47 82
18 35 87 10
20 04 82 47 65
19 01 23 75 03 34
88 02 77 73 07 63 67
99 65 04 28 06 16 70 92
41 41 26 56 83 40 80 70 33
41 48 72 33 47 32 37 16 94 29
53 71 44 65 25 43 91 52 97 51 14
70 11 33 28 77 73 17 78 39 68 17 57
91 71 52 38 17 14 91 43 58 50 27 29 48
63 66 04 68 89 53 67 30 73 16 69 87 40 31
04 62 98 27 23 09 70 98 73 93 38 53 60 04 23

This is the code. It a fully "working example", as long as the path to the file with the content above is correct.

class Graph:
    def __init__(self):
        self.nodes = []
        self.edges = []

    def add_node(self, node):
        self.nodes.append(node)

    def add_edge(self, edge):
        self.edges.append(edge)

    def edges_to_node(self, n):
        edges = [edge for edge in self.edges if edge.node1.id == n.id]
        return edges


class Node:
    def __init__(self, id, value, goal):
        self.id = id
        self.value = value
        self.goal = goal
        self.visited = False
        self.distance = 10000
        self.previous = None

    def __str__(self):
        return "{} - {}".format(self.value, self.goal)

    def __repr__(self):
        return "{} - {}".format(self.value, self.goal)


class Edge:
    def __init__(self, node1, node2):
        self.node1 = node1
        self.node2 = node2


f = open("problem18.data", "r")

content = f.read()
lines = content.split("\n")
data = []

graph = Graph()
index_generator = 1
last_line = len(lines) - 1

for i in range(len(lines)):
    data.append([])
    numbers = lines[i].split()
    for number in numbers:
        goal = i == last_line
        data[-1].append(Node(index_generator, -int(number), goal))
        index_generator += 1


for i in range(len(data)):
    for j in range(len(data[i])):
        node = data[i][j]
        graph.add_node(node)
        if i != last_line:
            node2 = data[i+1][j]
            node3 = data[i+1][j+1]

            edge1 = Edge(node, node2)
            edge2 = Edge(node, node3)

            graph.add_edge(edge1)
            graph.add_edge(edge2)


def dijkstra(graph, start):

    start.distance = 0
    queue = [start]
    while len(queue):
        queue.sort(key=lambda x: x.value, reverse=True)
        current = queue.pop()
        current.visited = True

        if current.goal:
            return reconstrcut_path(start, current)

        edges = graph.edges_to_node(current)
        for edge in edges:
            neighbour = edge.node2
            if neighbour.visited:
                continue
            queue.append(neighbour)

            new_distance = current.distance + neighbour.value
            if new_distance < neighbour.distance:
                neighbour.distance = new_distance
                neighbour.previous = current

    return []


def reconstrcut_path(start, n):
    path = []
    current = n
    while current.id is not start.id:
        path.append(current)
        current = current.previous
    path.append(start)
    return path


path = dijkstra(graph, graph.nodes[0])

tally = 0
for node in path:
    number = max(node.value, -node.value)
    print(number)
    tally += number
print(tally)

Can you help me troubleshoot what is wrong with this solution?

EDIT: The console output of the run:

98
67
91
73
43
47
83
28
73
75
82
87
82
64
75
1068

Upvotes: 1

Views: 306

Answers (2)

user555045
user555045

Reputation: 64904

Consider this alternative data set:

01
00 01
00 00 01
00 00 00 01
99 00 00 00 01

At least with negated costs, Dijkstra would explore that path of 1s and the zeroes that are "just off the path", but nothing else. The node that takes an other step down that path of 1s is always the best node in the queue, and it ends in a goal node so the algorithm terminates without exploring the rest of the triangle. It would never even see that there is a 99 hiding in the bottom left corner.

Upvotes: 1

Prune
Prune

Reputation: 77837

Actually, dynamic programming will knock this off neatly. My solution for this and problem 67 is less than 20 lines.

The focus here is very much a Dijkstra approach: work your way down the triangle, maintaining the maximum path cost at each node. Row 1 is trivial:

75

Row 2 is similarly trivial, as both values are ends: each has only one possible path:

95+75 64+75

which evaluates to

170 139

Row 3 has two ends, but the middle value gives us the critical logic: keep the larger of the two paths:

17+170 47+max(170, 139) 82+139
  187       217          221

Row 4 has two middles ... just keep going with the process:

18+187 35+max(187, 217) 87+max(217, 221) 10+221
  205        252             308          231

Can you take it from here?

As a check for you, the correct answer is quite close to the one you originally got.


Your solution fails because you didn't apply Dijkstra's algorithm. That requires that you maintain the best path to each node you've reached in your search. Instead, you used a row-by-row greedy algotriothm: you kept only the best path so far in the entire pass.

Specifically, when you found the 98 near the right side of the bottom row, you forced an assumption that it was part of the optimum path. You continued this, row by row. The data set is configured specifically to make this approach fail. The best path starts with the 93 + 73 + 58 sequence.

You have to keep all paths in mind; there's a path that is not the best sum for the bottom couple of rows, but catches up in the middle rows while the "fat" path gets starved with some lower numbers in the middle.

Upvotes: 4

Related Questions