sam-b.github.io

A few simple examples of using Z3 for constraint solving

A bug hunters diary / Lazy math

All the code for this post can be found on my Github Account.

While reading A Bug Hunters Diary the other day I came across a problem the author had in chapter 7 (investigating an XNU kernel bug). The author could control some data which was then used as the address passed to a call instruction. The data went through some basic transforms before it was used as an address, so in order to get it to be an address which the author/attacker controlled they wrote a simple script which brute forced the value they should place in that location as shown below.

#include <stdio.h>

#define MEMLOC 0x0041bda0
#define SEARCH_START 0x80000000
#define SEARCH_END 0xffffffff

int main(void) {

	unsigned int a,b = 0;

	for(a = SEARCH_START; a < SEARCH_END; a++ ){
		b = (a << 5) + 0x456860;
		if(b == MEMLOC){
			printf("Value: %08x\n", a);
			return 0;
		}
	}

	printf("No valid value found.\n");
	return 1;
}

This seemed like a good toy example to give an SMT solver a quick try. An SMT solver

This seemed like a perfect opportunity to make use of an SMT solver (see: https://doar-e.github.io/presentations/securityday2015/SecDay-Lille-2015-Axel-0vercl0k-Souchet.html#/ for the best intro I’ve found), SMT solvers allow us to express ‘symbolic’ variables and constraints on their values. Then if its possible to find any values which satisfy the constraints, it allows us to access a model which includes possible values which satisfy them. While this particular problem is a trivial example, it is a nice short and simple opportunity to illustrate how to make use of a pretty opaque tool.

In this case I chose to use the Z3 constraint solver/SAT solver from Microsoft, which is open source and has great python bindings. First of we need to import Z3 and read in the target value from the command line:

from z3 import *
import sys

if __name__ == "__main__":
	target = int(sys.argv[1],16)

Next we need to declare a symbolic variable which is equivalent to the ‘a’ variable in the earlier code, this will be the value which once transformed matches the target value.

	x = BitVec('x', 32)

Now we create a Z3 solver object and add the constraints we know to it, that the value should be between 0x80000000 and 0xffffffff and that when left shifted by 5 with 0x456860 added to it, it should equal the target value.

	s = Solver()
	s.add(target == (x << 5) + 0x456860)
	s.add(x > 0x80000000, x < 0xffffffff)

Finally we check that our constraints can be satisfied and if so extract the value for x from the model Z3 creates for us.

	if s.check():
		print hex(int(str(s.model()[x])))
	else:
		print "unsat :("

Obviously we could just do the math but that’s far less interesting and as you can see below using Z3 instead of brute forcing the value saves us a valuable 0.4 seconds :D

runtime comparison

Hackvent 2016 Day 15

All the code for this post can be found on github.

Hackvent consists of a series of hacking challenges released one per day in the run up to Christmas, hence the name…

I’ve been too busy to put much time in this year but the challenge for day 15 was designed to be solved with Z3 and once a friend linked me to it (http://hackvent.hacking-lab.com/challenge.php?day=15), I was well and truly nerd sniped and had to solve it.

The challenge starts by telling us:

We've captured a strange message. It looks like it is encrypted somehow ...
iw, hu, fv, lu, dv, cy, og, lc, gy, fq, od, lo, fq, is, ig, gu, hs, hi, ds, cy, oo, os, iu, fs, gu, lh, dq, lv, gu, iw, hv, gu, di, hs, cy, oc, iw, gc


We've also intercepted what seems to be a hint to the key:
bytwycju + yzvyjjdy ^ vugljtyn + ugdztnwv | xbfziozy = bzuwtwol
    ^         ^          ^          ^          ^
wwnnnqbw - uclfqvdu & oncycbxh | oqcnwbsd ^ cgyoyfjg = vyhyjivb
    &         &          &          &          &
yzdgotby | oigsjgoj | ttligxut - dhcqxtfw & szblgodf = sfgsoxdd
    +         +          +          +          +
yjjowdqh & niiqztgs + ctvtwysu & diffhlnl - thhwohwn = xsvuojtx
    -         -           -         -          -
nttuhlnq ^ oqbctlzh - nshtztns ^ htwizvwi + udluvhcz = syhjizjq
    =         =           =         =          =         
fjivucti   zoljwdfl   sugvqgww   uxztiywn   jqxizzxq

It also gives us a few hints:

Notes
  • assume q != 0
  • a letter is a decimal digit is a letter
  • each digit has exactly two different letter representations
  • C-like operator precedence
  • To be clear: abcd = a*1000 + b*100 + c*10 + d*1
  • Hint 1: Z3 solver can save your day!

The hint telling us to use Z3 makes it clear that we need to treat this as a constraint solving problem, using the hints and equations we have been given we can create constraints on symbolic representations of the letter variables and then Z3 can find valid values for them.

Once I had worked out the approach needed the first thing to do was generate the letter variables and all those expanded variables, I decided to automate this with a simple python script.

First of all I copy pasted the grid of equations and deleted stuff and added quotes and commas until I had a python list of all the expanded variable names as strings:

var = ['bytwycju','yzvyjjdy','vugljtyn','ugdztnwv','xbfziozy','bzuwtwol',
'wwnnnqbw','uclfqvdu','oncycbxh','oqcnwbsd','cgyoyfjg','vyhyjivb',
'yzdgotby','oigsjgoj','ttligxut','dhcqxtfw','szblgodf','sfgsoxdd',
'yjjowdqh','niiqztgs','ctvtwysu','diffhlnl','thhwohwn','xsvuojtx',
'nttuhlnq','oqbctlzh','nshtztns','htwizvwi','udluvhcz','syhjizjq','fjivucti','zoljwdfl','sugvqgww','uxztiywn','jqxizzxq']

Next I wrote some code to iterate over every character in each string of the var list and create a Set of the unique letters used:

s = set()
for i in var:
	for c in i:
		s.add(c)

Once we have this we can write code to generate a symbolic variable for each of the one letter vars:

for c in s:
	print c + " = BitVec('" + c + "',32)"

Each letter is a declared as a 32 bit symbolic bit vector with corresponding name.

Next the code prints out an instantiation of a Solver() and begins adding the constraint from the earlier notes that each letter only has a value between 0 and 10:

print "solv = Solver()"

for c in s:
	print "solv.append(" + c + " < 10, " + c + " >= 0)"

With the exception of ‘q’ of course for which I just deleted the ‘=’ from the ‘>= 0’ part after copy pasting the scripts output into my actual solving script later on. Next I created all of the complex variables that appear in the simultaneous equations:

for i in var:
	out = ""
	out += i + "="
	i = list(reversed(i))
	for c in range(0,len(i)):
		out+= i[c] + "*1" + "0"*c +"+"
	print out[:-1]

From the rules given to us earlier, we know that ‘jqxizzxq’ should be equivalent to: ‘q*1+x*10+z*100+z*1000+i*10000+x*100000+q*1000000+j*10000000’ which is what this section of code prints out.

Now I can run this script, which I called generate.py, and it will output most of the code that is needed:

sam@ubuntu:~/code/hackvent$ python generate.py 
c = BitVec('c',32)
b = BitVec('b',32)
d = BitVec('d',32)
g = BitVec('g',32)
f = BitVec('f',32)
i = BitVec('i',32)
h = BitVec('h',32)
j = BitVec('j',32)
l = BitVec('l',32)
o = BitVec('o',32)
n = BitVec('n',32)
q = BitVec('q',32)
s = BitVec('s',32)
u = BitVec('u',32)
t = BitVec('t',32)
w = BitVec('w',32)
v = BitVec('v',32)
y = BitVec('y',32)
x = BitVec('x',32)
z = BitVec('z',32)
solv = Solver()
solv.append(c < 10, c >= 0)
solv.append(b < 10, b >= 0)
solv.append(d < 10, d >= 0)
solv.append(g < 10, g >= 0)
solv.append(f < 10, f >= 0)
solv.append(i < 10, i >= 0)
solv.append(h < 10, h >= 0)
solv.append(j < 10, j >= 0)
solv.append(l < 10, l >= 0)
solv.append(o < 10, o >= 0)
solv.append(n < 10, n >= 0)
solv.append(q < 10, q >= 0)
solv.append(s < 10, s >= 0)
solv.append(u < 10, u >= 0)
solv.append(t < 10, t >= 0)
solv.append(w < 10, w >= 0)
solv.append(v < 10, v >= 0)
solv.append(y < 10, y >= 0)
solv.append(x < 10, x >= 0)
solv.append(z < 10, z >= 0)
bytwycju=u*1+j*10+c*100+y*1000+w*10000+t*100000+y*1000000+b*10000000
yzvyjjdy=y*1+d*10+j*100+j*1000+y*10000+v*100000+z*1000000+y*10000000
vugljtyn=n*1+y*10+t*100+j*1000+l*10000+g*100000+u*1000000+v*10000000
ugdztnwv=v*1+w*10+n*100+t*1000+z*10000+d*100000+g*1000000+u*10000000
xbfziozy=y*1+z*10+o*100+i*1000+z*10000+f*100000+b*1000000+x*10000000
bzuwtwol=l*1+o*10+w*100+t*1000+w*10000+u*100000+z*1000000+b*10000000
wwnnnqbw=w*1+b*10+q*100+n*1000+n*10000+n*100000+w*1000000+w*10000000
uclfqvdu=u*1+d*10+v*100+q*1000+f*10000+l*100000+c*1000000+u*10000000
oncycbxh=h*1+x*10+b*100+c*1000+y*10000+c*100000+n*1000000+o*10000000
oqcnwbsd=d*1+s*10+b*100+w*1000+n*10000+c*100000+q*1000000+o*10000000
cgyoyfjg=g*1+j*10+f*100+y*1000+o*10000+y*100000+g*1000000+c*10000000
vyhyjivb=b*1+v*10+i*100+j*1000+y*10000+h*100000+y*1000000+v*10000000
yzdgotby=y*1+b*10+t*100+o*1000+g*10000+d*100000+z*1000000+y*10000000
oigsjgoj=j*1+o*10+g*100+j*1000+s*10000+g*100000+i*1000000+o*10000000
ttligxut=t*1+u*10+x*100+g*1000+i*10000+l*100000+t*1000000+t*10000000
dhcqxtfw=w*1+f*10+t*100+x*1000+q*10000+c*100000+h*1000000+d*10000000
szblgodf=f*1+d*10+o*100+g*1000+l*10000+b*100000+z*1000000+s*10000000
sfgsoxdd=d*1+d*10+x*100+o*1000+s*10000+g*100000+f*1000000+s*10000000
yjjowdqh=h*1+q*10+d*100+w*1000+o*10000+j*100000+j*1000000+y*10000000
niiqztgs=s*1+g*10+t*100+z*1000+q*10000+i*100000+i*1000000+n*10000000
ctvtwysu=u*1+s*10+y*100+w*1000+t*10000+v*100000+t*1000000+c*10000000
diffhlnl=l*1+n*10+l*100+h*1000+f*10000+f*100000+i*1000000+d*10000000
thhwohwn=n*1+w*10+h*100+o*1000+w*10000+h*100000+h*1000000+t*10000000
xsvuojtx=x*1+t*10+j*100+o*1000+u*10000+v*100000+s*1000000+x*10000000
nttuhlnq=q*1+n*10+l*100+h*1000+u*10000+t*100000+t*1000000+n*10000000
oqbctlzh=h*1+z*10+l*100+t*1000+c*10000+b*100000+q*1000000+o*10000000
nshtztns=s*1+n*10+t*100+z*1000+t*10000+h*100000+s*1000000+n*10000000
htwizvwi=i*1+w*10+v*100+z*1000+i*10000+w*100000+t*1000000+h*10000000
udluvhcz=z*1+c*10+h*100+v*1000+u*10000+l*100000+d*1000000+u*10000000
syhjizjq=q*1+j*10+z*100+i*1000+j*10000+h*100000+y*1000000+s*10000000
fjivucti=i*1+t*10+c*100+u*1000+v*10000+i*100000+j*1000000+f*10000000
zoljwdfl=l*1+f*10+d*100+w*1000+j*10000+l*100000+o*1000000+z*10000000
sugvqgww=w*1+w*10+g*100+q*1000+v*10000+g*100000+u*1000000+s*10000000
uxztiywn=n*1+w*10+y*100+i*1000+t*10000+z*100000+x*1000000+u*10000000
jqxizzxq=q*1+x*10+z*100+z*1000+i*10000+x*100000+q*1000000+j*10000000

I put a short stub of code in the file 15.py:

from z3 import *  
import sys

encrypted = "iw, hu, fv, lu, dv, cy, og, lc, gy, fq, od, lo, fq, is, ig, gu, hs, hi, ds, cy, oo, os, iu, fs, gu, lh, dq, lv, gu, iw, hv, gu, di, hs, cy, oc, iw, gc"

if __name__ == "__main__": 

Onto this I appended the output from generate.py, now I just needed to add the simultaneous equations as constraints and then parse the model generated by Z3 if the constraints can be satisfied.

I added the equations as constraints by just copy pasting first horizontal and then the vertical lines from the grid of equations into ‘solv.add()’ calls. I could have scripted this as well but it didn’t seem worth the effort when most of them can just be copy-pasted straight in:

solv.add(bytwycju + yzvyjjdy ^ vugljtyn + ugdztnwv | xbfziozy == bzuwtwol)

solv.add(wwnnnqbw - uclfqvdu & oncycbxh | oqcnwbsd ^ cgyoyfjg == vyhyjivb)

solv.add(yzdgotby | oigsjgoj | ttligxut - dhcqxtfw & szblgodf == sfgsoxdd)

solv.add(yjjowdqh & niiqztgs + ctvtwysu & diffhlnl - thhwohwn == xsvuojtx)

solv.add(nttuhlnq ^ oqbctlzh - nshtztns ^ htwizvwi + udluvhcz == syhjizjq)

solv.add(bytwycju ^ wwnnnqbw & yzdgotby + yjjowdqh - nttuhlnq == fjivucti)

solv.add(yzvyjjdy ^ uclfqvdu & oigsjgoj + niiqztgs - oqbctlzh == zoljwdfl)

solv.add(vugljtyn ^ oncycbxh & ttligxut + ctvtwysu - nshtztns == sugvqgww)

solv.add(ugdztnwv ^ oqcnwbsd & dhcqxtfw + diffhlnl - htwizvwi == uxztiywn)

solv.add(xbfziozy ^ cgyoyfjg & 	szblgodf + thhwohwn - udluvhcz == jqxizzxq)

Finally, I wrote code to first see if the constraints were satisfiable and if so, generate a model with the values for each letter in it. I then turned this into a python map which I use to convert the letters in the original ‘encrypted’ string into the corresponding integers. I guessed that the letter pairs would be ASCII values and it turned out I was correct. So I cast each character pair to an int after conversion and then cast that into a char, all of which are appended together before being printed:

model_map = {}
ascii = ""
if solv.check():
	model = solv.model()
	for i in model:
		model_map[str(i)] = str(model[i])
	for p in encrypted.split(', '):
		tmp = ""
		for c in p:
			tmp += model_map[c]
		ascii += chr(int(tmp))
	print ascii
else:
	print "unsat :("

Finally I ran the code (15.py) and it printed out the solution as expected.