The Maia Test Framework

The MTF does things differently from any software test framework that I'm aware of. However, it does this through necessity, because hardware and software testing are different. If you don't care about this (and there's no particular reason that you should), then I suggest skipping the rest of this page. You don't need to know anything here if you simply want to use (or even change) the MTF. If you want to know how, and why, the MTF differs from xUnit-style frameworks, then read on.

Existing frameworks that carry out software unit testing come with a lot of baggage, including no clear agreement on what a 'unit test' actually is. xUnit-style frameworks have a common heritage, however, and are fairly consistent in how the tests should be carried out, and what constitutes a 'passed' or 'failed' test. The MTF does not follow these conventions in two important areas:

  1. 'drive' statements are the primary test mechanism in the MTF, although assertions can be used if preferred. xUnit frameworks are assertion-based.
  2. xUnit frameworks record a test failure if an assertion fails, and a pass otherwise. The MTF instead requires the expected results to be recorded in a 'golden' logfile, and carries out an intelligent comparison against this file to determine success or failure.

These choices aren't arbitrary, and were required because hardware and software testing are different. The sections below compare hardware and software testing, and are primarily a justification for how the MTF does what it does, and why it's different from xUnit frameworks. You may or may not agree with the MTF philosophy. It doesn't particularly matter, because developing a test framework is not actually a great deal of work (compared to developing a compiler, for example), and the framework is easily changed. My own view is that the MTF is a good fit for hardware development and TDD, but it will probably get better.


1: Hardware test slow, software test fast

Hardware testing is slow. To run a given test, you have to invoke a simulator, which then has to compile your HDL code, and elaborate your design, before even starting the simulation. And it's not just the simulator setup and teardown which are the problem:

  • In a software environment you can put a large number of test methods within a given test class, and these can be compiled and run as a single unit
  • Software tests are generally stateless. Hardware tests are generally stateful, and so require a large amount of additional setup

This may not seem particularly important, but software test frameworks are generally predicated on the ability to run hundreds of tests a second, while you would be lucky to run one test a second in a hardware framework. This can take a lot longer when you move on to integration tests: simply getting a small chip out of reset might take 10 minutes, for example.

Because hardware testing is slow, you need to think rather differently about what the framework does, what the basic test unit is, and how the tests are carried out. Note that the speed problem, of itself, doesn't mean that you can't carry out unit testing for hardware development. Unit testing was common in the software world 20 years ago, and our test speeds now are probably much the same as they were in software testing 20 years ago.

2: Hardware tests are stateful

Software testing tries to avoid 'stateful' tests (and some guidelines specifically state that unit tests should be stateless). In software testing, state makes life difficult, and you have to take additional action to confirm that your test always produces the same result when it is run. Hardware testing, on the other hand, is almost always about state: you need to confirm that the state changes correctly as time advances, with each new clock cycle. The only way to avoid state in hardware testing is to test purely combinatorial circuits, and these are likely to form a small part of your overall design.

3: Software testing is built on assertions

Software test frameworks are almost exclusively built using assertions. The general idea is that you test a single simple property of a unit (a class or method), and write a single assertion to test the correctness of that property. If you have thousands of properties to test, you write thousands of tests, instead of making thousands of assertions. Opinion differs on whether a test should really contain only a single assertion, however. Most guidelines state that you should have a single assertion, but real-life projects might instead include a handful in a given test. It is unusual to find assertions inside a loop, or even to find assertions in a control-flow construct; this violates the general principle of testing "one thing".

A given software test fails if an assertion fails and an exception is thrown; the exception is caught, and the test is flagged as failed. Opinion is again divided on how important it is to return a message to the test framework describing the failure.

Maia has an assert statement (with optional message reporting), so this could be used in an assertion-based framework. However, drive statements do essentially the same thing, and do it better:

  • a drive statement tests multiple DUT ports simultaneously. It forms a contract for every tested port, while an assertion forms a single contract. In software terms, a drive statement asserts that every output of a function is simultaneously correct, rather than testing a single output
  • Reporting is automatic. If the DUT output or I/O is not as expected, then an internal fail counter is incremented, and an automatically-generated message is recorded
  • In a drive statement, you only need to provide the expected port value. In an assertion, you must also explicitly provide the actual value to be checked (as, for example, the value returned from a function call)
  • drive statements avoid the confusion inherent in assertions as to what exactly has broken (is it a programming failure during development, or a contract failure on a DUT output?)
    In other words, the difference between Maia's assert and drive statements is much the same as the difference between Java's assert and JUnit's range of assertion tests (assertTrue, and so on) or, to a lesser extent, the difference between GoogleTest's ASSERT and EXPECT.
4: What is a 'small' test?

Everybody agrees that unit tests should test something 'small', but 'small' is not the same thing in software and hardware testing. The basic test unit translates well: a class or a method of that class in software, and a module in hardware. Beyond that, however, things get more complicated:

  1. Some software tests might be considered to be trivial to an engineer: you would never run up an entire simulation, with all the associated overhead, simply to confirm that 2 was greater than 1, or that 3 times 4 was 12. When you're simulating a multiplier, you need to confirm that it works, and not just that 3*4=12.
  2. Software tests do not generally test the basic libraries that they depend on. The test framework for GROMACS (a molecular dynamics package), for example, includes the comment "Since these functions are mostly wrappers for similar standard library functions, we typically just test 1-2 values to make sure we call the right function". You might be tempted to do this yourself with the odd DesignWare or CoreGen module but, in general, you will be writing the low-level stuff yourself, and you have to confirm that you got it right.
  3. Many, if not most, of the tests for a software product simply confirm that the compiler has done what you thought you asked it to do: the correct exceptions are thrown from a method, a function input is actually a non-null two-dimensional array, and so on. This doesn't translate well to hardware development.

In short, hardware unit testing is much more about algorithmic testing of a small part of the code, using an exhaustive set of inputs, or a basic set of sanity tests together with corner cases if exhaustive testing is not possible. Note that this does not include pseudo-random tests: tests at the unit level should be directed. PR testing, if appropriate, should be reserved for integration testing, where the state space increases as modules are combined.

5: How do you handle expected failures?

In common software test frameworks there is a limited ability to handle expected failures. In JUnit, for example, if you know that a test is going to fail, and you're willing to ignore the failure, you can do something similar to this to ignore the exception (or you can simply catch the 'unexpected' exception in the test):

@Test(expected=SomeException.class)
public void testSomething() {
     ...
}

In other words, you just ignore the exception, and pretend that the test passed. In hardware development, though, this is not going to cut it. There are always going to be failures during development. Some of these failures will be important, and some won't be. In general, you may see several different types of failure:

  1. A known bug in the hardware. In this case, you have already gone to the trouble of working out what the correct test output should be. This is the easy case: you just test against the expected correct result, and the test will show as failing until the hardware is corrected.
  2. Code that may or may not be 'good enough'. Your 120-bit arithmetic result is occasionally incorrect in the LSB, but it will take a large amount of additional hardware, and two weeks, to fix it. Do you mark this up as a failure, or do you ignore the LSB in your tests? How do you document this?
  3. Hardware that you designed, but with an unclear specification. The code does one thing, but you're now thinking that it should perhaps do something else. Is this a pass or a fail? How do you record what you think the code should be doing?
  4. Somebody else's code, that has no specification, and that you don't understand, and that may or may not work.

What's clear is that an assertion failure, or a lack of it, is not going to handle any of cases 2 to 4. The expected failure and documentation problems are, fundamentally, why the MTF uses golden logfiles to define test passes, rather than simply using a binary pass/fail predicate. This is described in more detail in the sections below.

6: How do you define 'pass' and 'fail'?

There are two obvious ways for a test framework to determine the status of a test:

  1. the xUnit way: if the test didn't throw an exception (in other words, no assertions failed) then it passed; if an exception was generated, then it failed
  2. the 'application' way: the test framework checks the return value from the test. The user might, for example, carry out an arbitrary sequence of checks (which may or may not be 'unit tests'), and exit the test program with exit(0) to signal a pass, and exit(1) to signal failure. The test framework might additionally detect assertion failures, and record a fail in these cases.

In both cases, the testing is completely binary: either the test passed, or it didn't. Both cases avoid the issue of expected failures, and neither records any useful information about the test itself. There are a number of other issues:

  • A single pass/fail predicate is a single point of failure waiting to happen. What if a failure condition accidentally results in an exit(0) rather than an exit(1)?
  • What exactly was tested and, if it failed, how did it fail? If you rely on a single pass/fail predicate, then the answer is only in the source code. Do you really want people to check your source code to find out what you tested?
  • How does your manager, or anybody else, actually know that you tested anything? When you look at the code again one year from now, how do you know what you tested? For a pass/fail predicate, the only documention is the pass/fail count itself.
  • The drive statement automatically records the number of times that the DUT outputs were as expected, and the number of DUT output failures, together with the failure itself. This information is of far more value than a simple '0' which indicates success.
7: Golden logfile testing

The MTF handles the ambiguity of the 'not quite right', or 'not really sure' tests with a 'golden' logfile. The logfile records the expected test results, as well as any other output generated by the program. In the simplest case, the entire logfile might look like this (this is the golden logfile for one of the tutorial examples):

  (Log)        (200 ns) 20 vectors executed (20 passes, 0 fails)

The test program (tut2.tv) had to do nothing more than execute the drive statements. However, there's nothing to stop you putting report statements in your code, with progress indicators, or general commentary on what you're testing. For longer integration tests, I add messages along the lines of "waiting for reset to complete", "waiting for GTHs to lock", "loading control registers", and so on. This adds documentation on what exactly the test is doing, inside the golden logfile itself, and will make it far easier for anyone else to understand your tests, and to confirm that they are doing something useful.

The simulator will, however, add its own output to any messages produced by Maia. There could be a large number of these, including compilation and elaboration messages, time-zero warnings, copyright messages, and so on. These have to be intercepted and removed by the MTF before comparison against the golden logfile. Writing the Tcl filters to remove these messages is straightforward, but a new filter is required for each supported simulator, and the filters must occasionally be fixed for new versions of a given simulator.