Some Silly Z3 Scripts I Wrote
As part of writing Logic for Programmers I produced a lot of “chaff”, code samples and sections I wrote up and then threw away. Sometimes I found a better example for the same topic, sometimes I threw the topic away entirely. It felt bad to let everything all rot on my hard drive, so I’m sharing a bunch of chaff for a tool called “Z3”, which has all sorts of uses in software research.
First an explanation of what this tool actually is, and then some scripts in order of increasing interestingness. All examples will use the Python bindings (pip install z3-solver).
What is Z3?
Z3 is an SMT solver.
Okay, what is SMT?
An SMT (“Satisfiability Modulo Theories”) solver is a constraint solver that understands math and basic programming concepts. You give it some variables, some equations with those variables, and it tries to find you a model, or set of assignments that satisfy all those equations.
Imagine you’re 12 years old and taking Algebra I for the first time, and see this problem:
4x + 2y == 14
2x - y == 1
solve for x & y
You’re supposed to learn how to do solve this as a system of equations, but if you want to cheat yourself out of an education you can have Z3 solve this for you.
# pip install z3-solver
import z3
s = z3.Solver()
x, y = z3.Ints('x y')
s.add(4*x + 2*y == 14)
s.add(2*x - y == 1)
result = s.check()
print(result)
if result == z3.sat:
print(s.model())
This is the most common way to use Z3: instantiate a solver, add constraints to the solver, and check it.1 Ints('x y') creates two “constants”. When we call s.check(), the solver tries to find values for the constants that satisfy all our constraints (the model). Here it prints “sat” followed by [x = 2, y = 3]. If there were multiple answers it would return the first one it found, and if there were none it would print “unsat”.
That’s the “satisfiability” part, anyway. The “Modulo Theories” part comes from the fact that Z3 carries a bunch of specialized solvers for different domains, or “theories”. This means it can handle constraints involving arrays, regexes, quantified expressions, etc.
From now on we’ll be like the cool kids and import the whole Z3 API into our namespace.
The Scripts
A Silly Math Question I Had
Are there four distinct positive integers
a, b, c, dwherea + b = c + danda * b = c * d?
The most direct way to express all of the constraints is like this:
from z3 import *
s = Solver()
a, b, c, d = Ints('a b c d')
s.add(a > 0, b > 0, c > 0, d > 0)
s.add(a + b == c + d)
s.add(a * b == c * d)
# We can add many constraints in one `add` call
s.add(a != b, a != c, a != d,
b != c, b != d, c != d)
print(s.check())
This returns “unsat”, meaning no such pair of pairs exists. In retrospect I should have expected this; see the dropdown for a proof.
show proof
For (a, b) and (c, d) to have the same sum, there must be some e where (c, d) = (a + e, b - e). Then cd = (a+e)(b-e) = ab + be - ae - ee, which = ab when ee = e(b-a), or e = b - a. Substituting that back in we get (c, d) = (a + b - e, b - b + a) = (b, a).
Nelson Elhage also pointed out that the question has a geometric interpretation: “are there two distinct non-congruent rectangles with the same perimeter and area?”
What if we try this with two distinct triples, i.e. six distinct numbers? I don’t want to write fifteen not-equals statements so let’s pull in some syntactic sugar:
IntVector('lhs', 3)returns the list of variables[lhs__0, lhs__1, lhs__2].SumandProductdo exactly what you’d expect.2Distinct(l)says that all the values inlmust be unique.
With that we can make the script much more scalable:
from z3 import *
n = 3
lhs = IntVector("lhs", n)
rhs = IntVector("rhs", n)
s = Solver()
for x in lhs + rhs:
s.add(x >= 0)
s.add(sum(lhs) == sum(rhs))
s.add(Product(lhs) == Product(rhs))
s.add(Distinct(lhs + rhs))
if s.check() == sat:
m = s.model()
l = [m[l].as_long() for l in lhs]
r = [m[r].as_long() for r in rhs]
print(l, r)
print(f"sum={sum(l)} prod={Product(l)}")
else:
print("unsat")
This time we do find a pair of triples: [7, 8, 15] and [14, 6, 10] have a sum of 30 and a product of 840.
If you run this you may get a different example, because we just asked for any satisfying models. We can find other models by adding new constraints, or change the solver to an optimizer that can minimize the found sum or product.
- s = Solver()
+ s = Optimize()
+ s.minimize(Sum(lhs))
This returns [8, 2, 9] and [4, 12, 3], with a sum of 19 and a product of 144.
Minimizing Annual Contributions
Z3’s optimizer takes a long time compared to dedicated constraint solving tools like MiniZinc, but we can still do some simple problems with it.
A bank account offers 1% annual percentage yield. What’s the minimum you have to deposit each year to have 10,000 after 20 years? Assume deposits happen after interest is paid.
So in the first year we have C dollars, in the second year we have C*1.01 + C dollars, etc. Now in SMT we can’t reassign the values of constants, but here’s where the python abstraction leaks a little bit. Say we write
x = Int('foo')
x = x + x
s.add(x < 6)
It looks like we assign the Python variable x to the SMT constant foo, but x is actually assigned to the expression “the value of foo”. The the line x = x + x assigns x to the expression “the value of foo + the value of foo”, meaning the line after adds the constraint foo + foo < 6.
This means we can compound interest like this:
years = 20
goal = 10000
r = RealVal('1.01')
c = Real('c')
balance = c
for t in range(1, years):
balance = balance*r+c
This time we’re using Real instead of Int. A Real is kinda like an infinite precision floating point.3 r needs to be a RealVal because it’s a fixed value and not something Z3 must solve for. This will calculate the final value of balance after 20 years.
All this said, this approach means we can’t know what value balance is in year 14. If we want the intermediate results, I can make balance an array of 20 reals, where balance[0] is the starting value and balance[n] == balance[n-1]*r + c. Full model:
from z3 import * # type: ignore
years = 20
goal = 10000
r = RealVal('1.01')
balance = RealVector('balance', years)
c = Real('c')
opt = Optimize()
opt.minimize(c)
opt.add(balance[0] == c)
for t in range(1, years):
opt.add(balance[t] == balance[t-1] * r + c)
if opt.check(balance[years-1] >= goal) == sat:
m = opt.model()
print(f"c = {m[c].as_decimal(2)}")
for x, i in enumerate(balance):
print(f"balance[{x}] = {m[i].as_decimal(2)}")
else:
print("unsat")
This finds the optimum deposit to be C=453.4. Optimization isn’t a popular use of Z3 because it’s so slow but it’s still a pretty cool feature.4
This also shows a useful trick: we can call check() with additional constraints. I didn’t know this was possible until fairly recently! This seems useful if we want to solve many instances of the same problem, but from what I’ve seen people prefer to instantiate new solver objects each time, not ‘parameterize’ one solver with new constraints.
Reverse Engineering an RNG
One thing I really want to do with the book is make sure each topic that I introduce has a useful application. It doesn’t have to be something that appeals to every single programmer, just be useful enough so people think “I can see how learning this could benefit somebody out there.” This led me to write an example about reverse engineering the values of an RNG.
Most random number generators in software are actually pseudorandom: they use a deterministic mathematical algorithm to generate a sequence of numbers that is “random enough” for most use cases. I talk about this more here. One such algorithm is the Linear Congruential Generator, or LCG. The LCG has fixed constants (a,c,m), a starting seed X, and computes the next value like this:
X = (a * X + c) % m
For example, if a = 6, m = 59, c = 0, the sequence from 1 would go 1, 6, 36, 39, 57, etc. Most sequences use much higher values, though. Let’s write an SMT problem that takes a sequence and m and recomputes (a, c).
from z3 import *
s = Solver()
modulus = 2**31
sequence = [4096, 618876929, 113892918, 1048278319]
a = Int('a')
c = Int('c')
s.add(0 <= a, a < modulus)
s.add(0 <= c, c < modulus)
for i in range(len(sequence) - 1):
s.add(sequence[i+1] == c + (a * sequence[i]) % modulus)
if s.check() == sat:
model = s.model()
print(f"a = {model[a]}")
print(f"c = {model[c]}")
This returns a = 22695477, c = 1, the RNG of the old Borland C++ compiler. This is just a toy example but the same approach also works on linear-feedback shift registers and Mersenne Twister.
Proving Theorems
The Z3 Repo calls Z3 a theorem prover. That means it should be able to prove theorems true in mathematics.
This works based on a property called “logical duality”. Take the theorem “addition is commutative”: all a, b in Real: a + b = b + a. This is logically equivalent to saying “there does not exist a, b where !(a + b == b + a)”. So we can ask Z3 to solve for the negation. If it can’t find a counterexample, then the theorem is true.
from z3 import *
s = Solver()
a, b = Reals('a b')
theorem = a + b == b + a
if s.check(Not(theorem)) == sat:
print(f"Counterexample: {s.model()}")
else:
print("Theorem true")
# Prints Theorem true
Usually theorems have conditions, like “if a != 0, then there is some b where a * b == 1”. The mathematician would write this as a != 0 => some b: a*b == 1. In Z3, p => q is written Implies(p, q), which we negate as normal.
from z3 import *
s = Solver()
a, b = Reals('a b')
theorem = Implies(a != 0, Exists(b, a*b == 1))
if s.check(Not(theorem)) == sat:
print(f"Counterexample: {s.model()}")
else:
print("Theorem true")
This returns “Theorem true”. Being able to prove theorems makes SMT solvers absolutely indispensable to formal verification. If you can convert a programming function to a set of Z3 equations you can prove that the function has expected properties.
Picking Stocks
At the same time I was developing all of these Z3 examples, I was writing examples for generalized constraint solvers like MiniZinc. I shared some of the chaff from that on my newsletter. I also formalized a couple of them as Z3 practice, and one stood out as being interesting:5
Given a list of stock prices through the day, find maximum profit you can get by buying one stock and selling one stock later.
This seems simple enough:
arr = [3,1,4,1,5,9,2,6,5,3,5,8]
buy, sell = Ints('i j')
opt.add(buy < sell)
opt.maximize(arr[sell] - arr[buy])
And this gives us:
TypeError: list indices must be integers or slices, not ArithRef
The problem is Z3 can’t use SMT variables to index Python arrays. Instead, Z3 has a “theory of arrays”, which means that you can add arrays as Z3 variables:
from z3 import *
opt = Optimize()
arr = [3,1,4,1,5,9,2,6,5,3,5,8]
arr_z3 = Array('A', IntSort(), IntSort())
for i, v in enumerate(arr):
opt.add(Select(arr_z3, i) == v)
buy, sell = Ints('i j')
opt.add(buy < sell)
opt.maximize(Select(arr_z3, sell) - Select(arr_z3, buy))
if opt.check() == sat:
m = opt.model()
buy_val = m.evaluate(arr_z3[buy])
sell_val = m.evaluate(arr_z3[sell])
profit = m.evaluate(sell_val - buy_val)
print(f"buy at {m[buy]} for {buy_val}")
print(f"sell at {m[sell]} for {sell_val}")
print(f"profit: {profit}")
So far, so good. But now we get a new problem:
buy at 21237 for -2
sell at 21238 for 0
profit: 2
To understand what’s going on, we have to understand what an SMT array actually “is”. An array is actually closer to a key-value map: it takes a key “sort” (basically a type) and a value sort. So our “array” arr_z3 maps integers to integers. The array must also have a Select(A, i) and a Store(A, i, val) operator (which returns a new array).6 SMT operations must be total for all possible inputs, meaning Select(A, i) must return a value no matter what i is. This all means that arrays don’t have a notion of “length” or “bounds” besides what we intentionally constrain. This is a valid Z3 program:
arr = Array('A', IntSort(), IntSort())
s = Solver()
s.check(Select(arr, 3) > Select(arr, -9))
print(s.model())
For me it outputs [A = Store(K(Int, -1), 3, 0)], which means “the array that maps every integer to -1 except for 3, which it maps to 0”.
That all explains why Z3 was able to buy stocks at 21237. To model our problem properly, we have to restrict buy and sell to the range of indices where we explicitly defined values.
opt.add(buy < sell)
+ opt.add(0 <= buy, sell < len(arr))
This now correctly returns a max profit of 8.
(What could possibly be the use of infinite arrays? Well, an array that maps every string to an integer is equivalent to a String -> Int function. This means Z3 can find functions that satisfy constraints!It’s not something I ever used in anger, though.)
What I put in the book
I had three goals for my SMT examples: they should be understandable to someone who is new to logic, they should look like a practical problem some readers might see in their jobs, and SMT solvers should be the right tool for solving them. And most of the examples above fail at least one of these:
- Nobody needs to find number triplets as part of their job.
- To demo reversing an RNG, I’d have to spend a bunch of time explaining PRNGs and LCGs.
- Conventional constraint solvers (covered in the same chapter) can optimize numerical problems much faster than an SMT solver can.
In the end I settled on three examples:
- An optimization problem that constraint solvers couldn’t do. Most solvers only work on numbers, so “find the largest common substring” is a good choice.
- Proving a very simple math property, mostly to motivate (3)
- Formally verifying a simple Python function.
I also wrote a quick script showing that Z3 could solve constraints on “machine integers”, which are represented with bitvectors.
We’ll see from feedback if these are actually good choices!
Other examples and Z3 Resources
I have a couple more examples I didn’t do for the book:
Other people’s examples:
- Andrew Helwer’s Firewall Equivalence
- Nelson Elhage’s Crossword Solver
- Hakan Kjellerstrand’s puzzles and toy models
More resources:
- The inventor’s z3py tutorial
- Microsoft Research’s Online Z3 guide. The SMT-LIB tutorial also shows modern Z3 features not supported by the Python SDK.
- The SMT-Lib Standard that Z3 follows.
Thanks to Nelson Elhage for feedback. If you liked this post, come join my newsletter! I write new essays there every week.
My book Logic for Programmers is now content-complete! While I wait for feedback from the technical reviewer, you can get the current beta and future improvements for 20% off here.
- The Python API is a loose layer that compiles to smtlib. If you want to see the actual thing fed into the solver, just append
print(s.sexpr())to the end of the script. [return] - Note these are higher-level convenience functions provided by the Python API. The actual generated Z3 just inlines
Prod(lhs)to(* (lhs__0 lhs__1 lhs__2)). [return] - To be pedantic, it’s actually an algebraic. You can shove the root of 2 in a Real but not, like, Pi. [return]
- Z3’s core engine is in C++, and yet a hand-written Python binary search finds the optimal
cabout a 1000x faster! Optimization in Z3 is real slow. [return] - Between that newsletter and this post Abdul Rahman Sibahi wrote A Dumb Introduction to z3, which has a Z3 conversion of the newsletter’s change counter problem. So I didn’t have to write that one! [return]
- The Python API desugars
A[i]toselect(A, i), meaning we could write the optimization goal asarr_z3[sell] - arr_z3[buy]and the constraint loop as[arr_z3[i] == arr[i] for i in range(len(arr))]). I explicitly usedselectfor transparency. [return]