# Hypothesis Testing with Oracle Functions

Posted on

This post is about the Hypothesis testing library. If you’re unfamiliar with it, check it out! Property tests are a fantastic addition to your testing suite. All examples use pytest.

Imagine we’ve written a bubblesort:

``````def bubblesort(l: List[int]) -> List[int]:
# blah blah blah
``````

What are some of the properties we could test? We could check that the input has the same length as the output:

``````from hypothesis import given
from hypothesis.strategies import lists, integers

@given(lists(integers()))
def test_same_length(l):
assert len(bubblesort(l)) == len(l)
``````

Or we could test that the sort is idempotent:

``````@given(lists(integers()))
def test_idempotent(l):
assert bubblesort(bubblesort(l)) == bubblesort(l)
``````

But neither test really gets at what we want: confirming the list is actually sorted. Fortunately, though, there’s a simple hypothesis test for that:

``````@given(lists(integers()))
def test_sorts(l):
assert bubblesort(l) == sorted(l)
``````

We just need to check that the output of the function matches the output of a sorting function! In this case, `sorted` is an oracle, something we know is correct and can compare our solution to. This may look like cheating (“if we already have a sorting function, why would we be writing another?”), but this can be a surprisingly powerful testing strategy. Here are some common use-cases:

### Refactoring with Oracles

One use case for an oracle is when you refactor an existing function: you want to clean up or optimize something you’re confident is correct and want to make sure your changes don’t break anything. Let’s take a toy example:

``````def is_prime(x: int) -> bool:
if x < 2:
return False
for i in range(2, x):
if x % i == 0:
return False
return True
``````

A common optimization we can do is only check up to the square root of the number:

``````from math import ceil, sqrt

def new_is_prime(x: int) -> bool:
# blah blah blah
for i in range(2, ceil(sqrt(x))):
# sup
``````

This seems like it works for simple test cases, but if we compare it to our original function, we find an error:

``````@given(integers())
def test_matches_oracle(x):
assert is_prime(x) == new_is_prime(x)
``````

This error comes from how `range` works: `list(range(2, 2)) == []`, so our function fails for 4. We found this by using our original version as an oracle against the refactored version.

### Special-Case Oracles

Sometimes we know our function works in specific cases but does not generalize. For example, the following function ostensibly takes a list and returns the highest possible product of three elements:

``````# We'll assume that all inputs will have more than 3 elements
# So no need for a bounds check
def buggy_max_triplet_prod(l: List[int]) -> int:
a, b, c = sorted(l, reverse=True)[0:3]
return a * b * c
``````

This is correct when all of the elements are positive. However, it fails for `[-4, -3, 1, 2]`, returning -6 instead of 12. We can still use this as an oracle, though. Since we know it’s correct for certain lists, our final function must match its output for those lists.

``````@given(lists(integers(min_value=0), min_size=3)) # only positive numbers
def test_matches_on_positive(l):
assert max_triplet_prod(l) == buggy_max_triplet_prod(l)
``````

Over our restricted domain, the broken function is correct, and can be used as an oracle.

### Inefficient Oracles

Sometimes there’s a solution to a problem that we know for sure works but is too slow/memory-intensive/etc for production code. We can use that solution as an oracle to test a more sophisticated function. If you wanted to write a function to insert into a sorted array, you could use the following as a reference oracle:

``````def ordered_insert_oracle(new_element: int, ordered_list: List[int]) -> List[int]:
return sorted(ordered_list + [new_element])

@given(lists(integers), integers())
def test_ordered_insert(l, x):
assert ordered_insert(l, x) == ordered_insert_oracle(l, x)
``````

The oracle will be O(n*log(n)), while a well written ordered insert will be O(n). That might matter in production, but it won’t matter too much in testing, so we can benefit from an oracle here.

### Reverse Oracles

Sometimes we can go the other way: instead of writing an oracle that will always get the right answer, we can write a function that takes an answer and finds a question for it. This gets a bit complicated, so let’s walk through an example. We’re writing our own counter.

``````from random import shuffle

@given(lists(tuples(integers(), integers(min_value=1, max_value=10)), unique_by=lambda x: x))
def test_counter(l):
test_list = []
for val, times in l:
test_list += [val] * times
shuffle(test_list)
assert count(test_list) == {val:times for val, times in l}
``````

Here, we’re taking a possible answer and generating a list that matches that. For example, if our test generates `l = [(5, 2), (1, 1)]`, that means we want our function to return `{5: 2, 1: 1}`. So we find a list that, if `count` is written correctly, would match that output. Our implementation is to first create the list `[5, 5, 1]` and then shuffle it.

This test ends up being more complicated than the others because it’s easier for the test itself to have bugs. Note the `unique_by`; that’s because our test will bug if we generate `l = [(1, 1), (1, 1)]`: we’ll assert `{1: 2} == {1: 1}`.

## Caveats

Most of these caveats apply to property-based tests (PBTs) in general.

• You’ll want to write some unit tests to sanity-check your oracles. This throws you down a rabbit hole (“I have to test my tests?!”), but you get much wider test coverage with PBTs in return for the extra complexity.
• Sometimes coming up with the oracles can be as or more difficult than writing the final functions. As always, writing good code involves tradeoffs.
• The further you get from pure functions the harder writing PBTs gets. Reverse oracles help a bit here.
• When writing bullet points like this I’m always tempted to add something completely unrelated. Here’s something I wrote about how much I like dogs.
• Don’t try to be overly clever and use oracles in cases where more basic PBTs (or even unit tests) suffice.
• The benefits PBT provides are orthogonal to the benefits you get from unit tests, type systems, formal methods, and getting a full night’s sleep. Ideally you’d want to have all five.
• Ruby doesn’t have a good PBT library and that makes me sad.