smithjonesjr
smithjonesjr

Reputation: 981

What are the sources of non-robustness in Dafny proofs?

I occasionally (not frequently, but often enough) see it happen that a proof will be working in Dafny, and then something that appears irrelevant will change (e.g., variable names, function definition that aren't relevant to the proof, and so on) and the proof will break.

I'm interested in knowing the technical reasons why this happens. I currently have a rough understanding of E-matching and quantifier instantiation. It's neither obvious to me that these algorithms ought to be robust to these seemingly-irrelevant input features; nor is it obvious why they wouldn't be robust. I also don't know if there are any considerations for E-matching implementations in practice that would result in non-robustness.

By "robust", I mean "will either always succeed or will always not succeed, independent of factors like variable names".

So, I'm interested in understanding the technical reasons (either practical or theoretical) for this non-robustness.

Part of the reason why I'd like to gain more of an intuition here is so that when I encounter such a situation, I can patch the proof in such a way as to make it more robust. (Instead, what usually happens now is that I will patch the proof in some arbitrary way, it gets fixed, and then it breaks again later.)

The sort of answer I'm hoping to find is something like "Z3 uses algorithm X, which uses complex heuristic Y to determine when to give up looking in a part of the search space, and it's clear that Y depends on the search order". An answer like that probably wouldn't help me write better Dafny proofs, but it would satisfy my curiosity.

Upvotes: 2

Views: 86

Answers (1)

alias
alias

Reputation: 30525

There's no easy answer to any of these questions. Below, I'll summarize my experience with z3. While Dafny folks might provide other advice, I'll hazard a guess that it applies to all such (semi-)automated proof tools in existence.

It's well known that even changing variable names can make z3 respond sat or unknown, see this thread for a discussion.

When you're using Dafny/Boogie etc., they perform a ton of transformations on your program, and even simple changes can cause the backend solver to exhibit wildly differing behavior. (Note that sat should never become unsat or vice versa: That would be a legitimate bug. But things can become unknown, or take vastly differing times.) Here's another thread discussing a similar phenomenon.

The fundamental reason always go back to the random seed that z3 uses for the problem, the heuristics picked, and a ton of settings that can impact the search process. Run z3 -pd on your terminal. As of today, there's more than 500 "parameters" that you can tweak! It's nearly impossible to try all relevant options, let alone pick the "right" settings. There's research into "modifying" the parameters at fly to pick the best performing set for regression purposes, but that usually doesn't help since problems and solvers change constantly.

You can even get differing performance on the exact same problem by varying the z3 parameter smt.random_seed=N. For different values of N, performance can vary.

So, as a practitioner, there isn't really much you can do; unfortunately. Especially if you are using a tool like Dafny/Boogie on top of z3, which adds its own magic. A simple idea is to put your many cores to use: Start many instances of the solver on many of your cores with different seeds, heuristics, tactics, etc., and pick the fastest result. Of course, this is easier said than done.

To sum up, most of these solvers/systems work as black-boxes. Even if you knew a lot about their internals, it would be hard to always "robustly" use them. In fact, being able to do so is probably worthy of a Ph.D. level dissertation work, if you were to be able to come up with a solver/proof-system that was immune to such variances.

As an end-user, and assuming you're using SMTLib, there's a set of "common" tricks and pitfalls you should be aware of that can help to create better-performing models. See this answer for an overview.

But there's also no need to despair. Things are getting better, and today's solvers are many times more performant compared to yesteryear's. The SMT-competition results are a good way to track progress.

Best of luck!

Upvotes: 2

Related Questions