Many bugs are implementation errors: there is a mistake in the code that makes it not do what you wanted it to do. For example, you may have accidentally left out the “list is empty” case, or written a nonterminating function. You can identify it as “definitely wrong” for a given input. Most testing, in fact most writing on software correctness, deals primarily with implementation errors.
Above that we have specification errors. The code perfectly matches your design, but your design doesn’t satisfy your requirements. Something like “we didn’t specify what happens if you load the same record twice.” These can be tougher to find than implementation errors. They span the code and the design, not just the code. Most of my writing focuses on specification errors.
Above that we have requirement errors. The design satisfies the requirements, but you have the wrong requirements entirely. Maybe it alerts when it detects an anomaly, but the client really wanted it to log. Requirement errors can be the most difficult to “debug”. Adding the client causes all sorts of logistical problems, just one of which is requirement ambiguity. A client might not even realize they wanted something until they try the product!
This is why so many Agile methods emphasize short sprints and prototyping. It reduces the time between getting requirements and finding issues in them. I wondered if tooling can also help here. Maybe they can catch particular types of errors earlier in the project, even before you prototype. I started a couple experiments on this and so far they’re cautiously promising.
This post focuses on identifying emergent ambiguity (EA): where the rules miss a given case. This can happen when there are many rules with overlapping domains: if flags A, B, and C can all potentially apply to a situation, you have 8 possible combinations of 3 flags.1 It’s easy to miss specifying what should happen in one of them. EA seems like one of the “easier” errors to find. If you have a finite list of rules, you can enumerate every combination and ask the client to fill in any gaps.
Let’s see this in action. We’ll use the Gilded Rose kata as an example because that’s what got me thinking about this in the first place.
Hi and welcome to team Gilded Rose. As you know, we are a small inn with a prime location in a
prominent city ran by a friendly innkeeper named Allison. We also buy and sell only the finest goods.
Unfortunately, our goods are constantly degrading in quality as they approach their sell by date. We
have a system in place that updates our inventory for us. It was developed by a no-nonsense type named
Leeroy, who has moved on to new adventures. Your task is to add the new feature to our system so that
we can begin selling a new category of items. First an introduction to our system: Pretty simple, right? Well this is where it gets interesting: We have recently signed a supplier of conjured items. This requires an update to our system: Feel free to make any changes to the Just for clarification, an item can never have its Quality increase above 50, however “Sulfuras” is a
legendary item and as such its Quality is 80 and it never alters.
UpdateQuality method and add any new code as long as everything
still works correctly. However, do not alter the
Item class or
Items property as those belong to the
goblin in the corner who will insta-rage and one-shot you as he doesn’t believe in shared code
ownership (you can make the
UpdateQuality method and
Items property static if you like, we’ll cover
Hi and welcome to team Gilded Rose. As you know, we are a small inn with a prime location in a prominent city ran by a friendly innkeeper named Allison. We also buy and sell only the finest goods. Unfortunately, our goods are constantly degrading in quality as they approach their sell by date. We have a system in place that updates our inventory for us. It was developed by a no-nonsense type named Leeroy, who has moved on to new adventures. Your task is to add the new feature to our system so that we can begin selling a new category of items. First an introduction to our system:
Pretty simple, right? Well this is where it gets interesting:
We have recently signed a supplier of conjured items. This requires an update to our system:
Feel free to make any changes to the
Just for clarification, an item can never have its Quality increase above 50, however “Sulfuras” is a legendary item and as such its Quality is 80 and it never alters.
We’re presented with an existing system that satisfies these requirements, but terribly. Nested if statements and all that fun. In order to add the new feature, we have to refactor the existing code. This pushes the practitioner to write tests that ensure the existing behavior is unchanged.
It’s not enough to just write tests that conform to the requirements, though. This is because the requirements are incomplete. In particular:
- When we say “lowers both values”, how much do we lower by? Does Quality decrease by 1, 2, 1.5?
- Are item types exclusive? Can something be both “backstage passes” and “aged brie”?
- Do we decrement quality and sell-by in an order, or simultaneously? Depending on our choice here this can affect the value of the item at boundary times, like
sell_by = 0.
The provided implementation implicitly answers all three: items are at most one additional type, we lower values 1 at a time, and we do this weird thing where we modify quality twice, both before and after the sell-by calculation. They’re ambiguous, but not the particular kind of EA I care about right now.
We’ll look instead at the value calculation logic. We have seven rules for determining the change in quality for an item. For the requirements to be complete, we should know all possible ways they can interact. The easiest way to do that is to construct a decision table.
Decision tables map a set of finite enumerations to outputs, where every possible combination of inputs is represented in exactly one row. If there are any missed requirements, it will correspond to a missing row. If there are contradictory requirements, there will be two rows with the same input and different outputs. Here the inputs would be the item properties and the output will be the change in item quality.2
Let the initial quality be
q. We can decompose the problem into two decision tables. The first determines
q', the new value if we don’t restrict quality to the
0-50 range. The second determines
final, which is the clamped value of
final then becomes the new quality.
The clamp table is pretty easy:
Now for the input table. We have two enumerable inputs:
typeis either (S)ulferas, (B)rie, (P)ass, or (M)isc.
days_leftis four ranges:
10-. Since these full ranges only matter for the Pass, in the rest of the cases I’ll instead use
>=0. In a real problem I wouldn’t do this, but it makes showing the concept easier here.3
We have one emergent ambiguity: what happens when Brie becomes outdated? We have “Brie increases with quality over time” and “outdated items lose quality twice as fast.” How are these supposed to interact? I can see a few different ways the client might want this to go:
- Brie rule overrides sell-by rule:
q'=q+1. This is what the standard implementation does.
- Sell-by rule overrides brie rule:
- When the client said “lose quality twice as fast”, they meant “degrades one step faster”, which is the same result for misc items. We have
- The client meant “changes twice as fast”, regardless of the direction.
- We should handle Brie in a special way not currently covered by these rules.
None of these seem particularly unlikely. Some seem more likely than others, but none of them trigger a “that’s stupid” feeling in me.
The New Rule
Then the kata adds a new rule:
- “Conjured” items degrade in Quality twice as fast as normal items
Is “conjured” its own type? Maybe, maybe not. If the client ends up saying “conjured” is not its own type, and can be added to any item, we get the new table:
Going down the list:
- If you conjure a ticket, what’s the new value? Does it still gain value over time? Does it even have value at all? Maybe it’ll be considered counterfeit! No matter what, though, we can safely assume it goes to zero after the due date. There’s no reasonable reading of the rules that implies otherwise.
- Does conjured brie gain quality or lose quality? This is the same problem as with overdue unconjured brie.
- What happens when conjured brie goes overdue? Now instead of two intersecting rules, you have three.
- What happens when a miscellaneous conjured item becomes overdue?
That last one interests me the most because it’s a “common” case. You might argue that “overdue conjured aged brie” is an edge case and it’s normal to be ambiguous about edge cases. But “overdue conjured item” might happen all the time. It even happens if “conjured” is its own type! The answer is not self-evident, as there’s at least two meaningful interpretations:
- The client intended the penalties to multiply, so now it’s losing quality four times as fast.
- The client wanted the penalties to each be applied and then aggregated. So “overdue” applies
-1and “conjured” applies
-1, giving us
- Something else.
In my opinion the first choice seems the most reasonable. But the client decides that, not me! If they expected it to be
q'=q-3, then implementing
q'=q-4 is a requirements error. And I know of at least one case where “apply-then-aggregate” was what the client decided.
I get three impressions from this exercise:
There are tangible benefits to modeling requirements. “Gilded Rose” is a very small kata, something that’s not supposed to take more than hour or two. And even it has emergent ambiguity. It’s also likely the author didn’t intend this ambiguity, as they framed the kata as a “refactoring” problem. There’s requirement issues that they may have missed, and we found, via formal modeling.
Decision tables are a good way of modeling requirements. Their simplicity shines here. It took me about as much time to write those DTs as it did to write and edit this paragraph. They can be understood by anyone, even without prior exposure. And with a few minutes of training, anybody can write one. You can show them to clients and they’ll know what’s going on.
Decision tables aren’t the most powerful way of modeling requirements. I was fairly lucky with this problem: it didn’t involve complex state, input ordering, anything that was out of DT scope. And even then I stretched a lil’ bit with the date ranges. DTs are great because they have such incredibly high strength/weight ratios. More powerful tools are harder to learn and apply than DTs are. But it makes me optimistic that we can push this further.
- If they’re order-dependent (applying A and B is different from applying B and A) then there are 16. [return]
- Applying a decision table would be much harder if the rules were order-dependent, but thankfully they’re not here. [return]
- This might have an off-by-one error from the original implementation, but again, that’s not as important as discussing the core idea here. [return]