Testing for Bugs Not Usually Caught by Tests

One of the core ideas in testing is to use small pieces of code that verify the target program/module/function/etc exhibits a certain behavior. These small snippets of code are usually termed unit tests or simply tests.

While it is hard to underestimate their importance, they can only test what can be expressed as program code. It is borderline1 impossible however to ensure that memory leaks or buffer overruns did not happen when running the test suite — at least that is the case in programming languages that do not contain such checks as a matter of course.

To show off how to find bugs that are not commonly caught be unit tests, we will use a small C program that has several bugs and a small test suite. Its purpose is best displayed by just having a look at that test:

#include "vector.h"  
#include <stddef.h>
#include <math.h>
#include <stdint.h>
#include <stdio.h>

int main() {  
  vector v;
  vector_init(v, sizeof(double));

  printf("Initially, the vector should be empty. %s.\n",
    vector_empty(v) ? "OK" : "FAIL");
  printf("Its size should be zero. %s.\n",
    vector_size(v) == 0 ? "OK" : "FAIL");

  double x = 3.1415926535, y;
  vector_push_back(v, &x);

  printf("After inserting one element, it should not be empty. %s.\n",
    !vector_empty(v) ? "OK" : "FAIL");
  printf("Its size should be one. %s.\n",
    vector_size(v) == 1 ? "OK" : "FAIL");
  printf("Its capacity is %zu, which should be at least 1. %s.\n",
    vector_capacity(v), vector_capacity(v) >= 1 ? "OK" : "FAIL");

  vector_get(v, 0, &y);
  printf("The stored value should be retrievable. %s.\n",
    x == y ? "OK" : "FAIL");

  for(unsigned i = 2; i <= 30; ++i) {
     x = fabs(i - x);
    vector_push_back(v, &x);

    printf("Size should be %zu. %s.\n",
      i, vector_size(v) == i ? "OK" : "FAIL");
    printf("Its capacity is %zu, which should be at least %zu. %s.\n",
      vector_capacity(v), i, vector_capacity(v) >= i ? "OK" : "FAIL");
  }

  vector_clear(v);
}

The intention of those tests are rather simple: Create and initialize (remember that C does not provide constructors) a variable v of type vector (lines 8-9). The term "vector" is stolen from C++, where it is used for a type that implements a dynamic array. The same happens here: In the beginning (lines 11-14), it is empty. After inserting a number (line 17), its size is one, and its capacity large enough to store at least one element (19-24). The number that was stored can also be retrieved at index 0 (26-28). The insertion is then repeated a few times with different numbers (30-38), and finally the vector's storage is cleaned up (40).

So far so good, it seems as if everything is being accounted for: All variables are properly initialized and cleaned up where necessary. The major functionality associated with a dynamic array is also verified: It dynamically resizes and can be indexed.

Manual Code Inspection

One way to find bugs is to simply look at the implementation and go through it line by line. While it is very much possible that one of you will find a bug this way, remember that we do not just wish to find "a bug", but to increase the confidence that no further bugs are still lurking in the program.

That being said, here is vector.h:

#pragma once

#include <stddef.h>
#include <stdbool.h>

typedef struct {  
  size_t size;
  size_t capacity;
  size_t element_size;
  char* data;
} vector[1];

void vector_init(vector v, size_t const element_size);  
void vector_clear(vector v);  
bool vector_empty(vector const v);  
size_t vector_size(vector const v);  
size_t vector_capacity(vector const v);  
void vector_set(vector v, size_t index, void const* element);  
void vector_get(vector v, size_t index, void* element);

The vector type uses a trick I first encountered in the GNU Multiple Precision Arithmetic Library to effect something akin to pass-by-reference. It does so by defining the type vector to be an array of one element of an anonymous structure. When declaring a local variable, this really has little impact - an array of one variable of a user defined type2 is really not much different from a variable.

The magic happens when defining a function that takes a parameter of type vector, as there is a special rule in C (and C++) for parameters of array type: A parameter of type array of some type T, really is a parameter of type pointer to T! In combination with another fact – when passing an object of type array of T to a function expecting a pointer to T, the array automatically decays to a pointer to its first element – this means that we can declare variables as usual, but when passing a vector, a pointer to the anonymous structure is passed instead automatically.

And vector.c:

#include "vector.h"

#include <stdlib.h>

void vector_init(vector v, size_t const element_size) {  
  v->size = 0;
  v->capacity = 0;
  v->element_size = element_size;
  v->data = NULL;
}

void vector_clear(vector v) {  
  v->size = 0;
  v->capacity = 0;
  v->element_size = 0;
  v->data = NULL;
}

bool vector_empty(vector const v) { return v->size == 0; }  
size_t vector_size(vector const v) { return v->size; }  
size_t vector_capacity(vector const v) { return v->capacity; }  
void vector_pop_back(vector v) { --v->size; }

void vector_push_back(vector v, void const* element) {  
  if(v->size >= v->capacity) {
    size_t const new_cap = (v->capacity * 5) / 3 + 32;
    v->data = realloc(v->data, new_cap);
    v->capacity = new_cap;
  }
  ++v->size;
  vector_set(v, v->size - 1, element);
}

void vector_set(vector v, size_t index, void const* element) {  
  for(size_t i = 0; i <= v->element_size; ++i) {
    v->data[index * v->element_size + i] = ((char const*)element)[i];
  }
}

void vector_get(vector v, size_t index, void* element) {  
  for(size_t i = 0; i <= v->element_size; ++i) {
    ((char*)element)[i] = v->data[index * v->element_size + i];
  }
}

To reiterate: Even if you have found some mistakes, are you sure you have found all of them?

Unit Tests

If the small test suite would already find a lot of bugs, this post would be rather boring, so of course they do not. In fact, the unit tests give the following output when being run:

Initially, the vector should be empty. OK.
Its size should be zero. OK.
After inserting one element, it should not be empty. OK.
Its size should be one. OK.
Its capacity is 32, which should be at least 1. OK.
The stored value should be retrievable. FAIL.
Size should be 2. OK.
Its capacity is 32, which should be at least 2. OK.
[...]
Size should be 30. OK.
Its capacity is 32, which should be at least 30. OK.

However, a diligent software developer might notice that some warnings3 are being thrown during compilation of the test suite:

xample.c:17:2: warning: implicit declaration of function 'vector_push_back' is invalid in C99 [-Wimplicit-function-declaration]

xample.c:30:39: warning: format specifies type 'size_t' (aka 'unsigned long') but the argument has type 'unsigned int' [-Wformat]
  printf("Size should be %zu. %s.\n", i, vector_size(v) == i ? "OK" : "FAIL");

xample.c:31:90: warning: format specifies type 'size_t' (aka 'unsigned long') but the argument has type 'unsigned int' [-Wformat]
    printf("Its capacity is %zu, which should be at least %zu. %s.\n", vector_capacity(v), i, vector_capacity(v) >= i ? "OK" : "FAIL");

3 warnings generated.

The first warning may be most surprising in the sense that it compiles at all. The function vectorpushback really is not declared in vector.h, but in previous versions of the C programming language this just means that its argument types are deduced from the first usage which gives them as vector and double*, and a return type of int. This prototype is "compatible enough" on the target ABI so that it works in practice. This problem can however be trivially fixed by adding a proper forward declaration in vector.h.

The other two warnings both have to do with the type of i, which is not size_t, but rather unsigned and therefore should use the format specifier "%u". Again, this does not seem to cause a problem on the target ABI, but can lead to almost arbitrary horrible problems, as it can easily smash the stack.

After fixing all warnings, the unit tests still successfully run to completion, so let us now employ more advanced techniques.

ASan

What if we could simply tell the compiler to add more checks, preferring safety over performance? Well, simply put, we can! Modern versions of both gcc and clang support ASan (Address Sanitizer) simply by adding -fsanitize=address to their command line.

Stack Buffer Overflow

So, let's check what happens when the test suite is run with ASan enabled:

Initially, the vector should be empty. OK.
Its size should be zero. OK.
=================================================================
==2114==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7ffc18778808 at pc 0x0000004d8ffe bp 0x7ffc18778740 sp 0x7ffc18778738
READ of size 1 at 0x7ffc18778808 thread T0
    #0 0x4d8ffd in vector_set /home/ghast/Dropbox/projects/asanxample/vector.c:36:42
    #1 0x4d8ffd in vector_push_back /home/ghast/Dropbox/projects/asanxample/vector.c:31
    #2 0x4d930f in main /home/ghast/Dropbox/projects/asanxample/xample.c:17:2
    #3 0x7fd03a58660f in __libc_start_main (/usr/lib/libc.so.6+0x2060f)
    #4 0x417348 in _start (/home/ghast/Dropbox/projects/asanxample/obj/xample+0x417348)

Address 0x7ffc18778808 is located in stack of thread T0 at offset 104 in frame
    #0 0x4d91ef in main /home/ghast/Dropbox/projects/asanxample/xample.c:9

  This frame has 3 object(s):
    [32, 64) 'v'
    [96, 104) 'x' <== Memory access at offset 104 overflows this variable
    [128, 136) 'y'
[...]
==2114==ABORTING

Oops. Seems that, in spite of the test suite, we missed something rather important.

It seems as if pushing x in to the vector caused an overflow in which at least one byte more too much is read. Checking the implementation of vector_set, we can indeed find a off-by-one error in vector.c line 35, where we should not be checking for less-or-equal, but just for less.

By checking the implementation of the similarly written vector_get, we can find a similar bug, which we also fix before rerunning our test suite.

Heap Buffer Overflow

Initially, the vector should be empty. OK.
Its size should be zero. OK.
After inserting one element, it should not be empty. OK.
Its size should be one. OK.
Its capacity is 32, which should be at least 1. OK.
The stored value should be retrievable. OK.
Size should be 2. OK.
Its capacity is 32, which should be at least 2. OK.
Size should be 3. OK.
Its capacity is 32, which should be at least 3. OK.
Size should be 4. OK.
Its capacity is 32, which should be at least 4. OK.
=================================================================
==2190==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x60300000f000 at pc 0x0000004d8f9d bp 0x7fff18d53820 sp 0x7fff18d53818
WRITE of size 1 at 0x60300000f000 thread T0
    #0 0x4d8f9c in vector_set /home/ghast/Dropbox/projects/asanxample/vector.c:36:40
    #1 0x4d8f9c in vector_push_back /home/ghast/Dropbox/projects/asanxample/vector.c:31
    #2 0x4d9434 in main /home/ghast/Dropbox/projects/asanxample/xample.c:28:3
    #3 0x7fa59bfe460f in __libc_start_main (/usr/lib/libc.so.6+0x2060f)
    #4 0x417348 in _start (/home/ghast/Dropbox/projects/asanxample/obj/xample+0x417348)

0x60300000f000 is located 0 bytes to the right of 32-byte region [0x60300000efe0,0x60300000f000)
allocated by thread T0 here:
    #0 0x4ad660 in realloc (/home/ghast/Dropbox/projects/asanxample/obj/xample+0x4ad660)
    #1 0x4d8e9d in vector_push_back /home/ghast/Dropbox/projects/asanxample/vector.c:27:13
    #2 0x4d930f in main /home/ghast/Dropbox/projects/asanxample/xample.c:17:2
    #3 0x7fa59bfe460f in __libc_start_main (/usr/lib/libc.so.6+0x2060f)
[...]
==2190==ABORTING

Oops again. This time, the bug is somewhat harder to debug, as the code location that triggers it is not the location that caused it. A small hint can be gleaned from checking what the size of the allocated memory region should be to be able to store at least 5 doubles: 5*8 = 40. Its size, as reported by ASan is only 32 bytes, meaning that the error must be somewhere in the reallocation logic.

The error really is a semantic one: Size and capacity are stored as numbers of elements, while realloc expects a byte count. This means that the reallocation should be done like this: v->data = realloc(v->data, new_cap * v->element_size);

Memory Leak

Alright, having already fixed a veritable multitude of bugs, let us run the test suite once more:

Initially, the vector should be empty. OK.
Its size should be zero. OK.
After inserting one element, it should not be empty. OK.
Its size should be one. OK.
Its capacity is 32, which should be at least 1. OK.
The stored value should be retrievable. OK.
[...]
Size should be 30. OK.
Its capacity is 32, which should be at least 30. OK.

=================================================================
==2222==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 256 byte(s) in 1 object(s) allocated from:
    #0 0x4ad660 in realloc (/home/ghast/Dropbox/projects/asanxample/obj/xample+0x4ad660)
    #1 0x4d8ec7 in vector_push_back /home/ghast/Dropbox/projects/asanxample/vector.c:27:13
    #2 0x4d934f in main /home/ghast/Dropbox/projects/asanxample/xample.c:17:2
    #3 0x7fb8bbb1460f in __libc_start_main (/usr/lib/libc.so.6+0x2060f)

SUMMARY: AddressSanitizer: 256 byte(s) leaked in 1 allocation(s).

Oops, a final time. Seems like there may be something wrong with the cleanup function. Let's have another look at it:

void vector_clear(vector v) {  
  v->size = 0;
  v->capacity = 0;
  v->element_size = 0;
  v->data = NULL;
}

Well, since we allocated memory on the heap, there should be a free somewhere in here... But instead the pointer is just nulled without cleaning up the memory. While that might work in a garbage collected language, it will just cause a memory leak in C. Once more the fix is simple:

void vector_clear(vector v) {  
  v->size = 0;
  v->capacity = 0;
  v->element_size = 0;
  free(v->data);
  v->data = NULL;
}

Note that all elements are zeroed out by the clear function although it is not really necessary at all. Although this coding style is very defensive in nature and might very well help in other cases, it did not protect against any of the bugs we encountered in this post.

Closing Remarks

After fixing the memory leak, all bugs that I intentionally added to this example have been detected and removed. While there may be many more bugs hidden in here4, we can now be at least a bit more confident in the code.

While this example used C, it works almost identically in C++. To make it a standards compliant C++ program, it is however necessary to replace the C standard library headers with their respective C++ versions (drop the ".h" ending and add a "c" in front) and use the namespace ::std for types such as ::std::size_t.

If you wish to repeat these experiments, the original code is available here. To enable ASan, just remove the hash sign in line 5 of the makefile.

Footnotes

  1. One could always use a "unit test" that rewrites the original program in some way to add those checks. A simple version of that is to define custom malloc, calloc, realloc and free, which allows one to implement a bunch of heap checks.

  2. There is a difference for many built-in types, as their operators have special meaning. Adding one int to a different int has completely different meaning than trying to add two arrays of integers – which really has no meaning at all. Since C does not allow user defined operators, this cannot happen for user defined types.

  3. The warnings are slightly reformatted to better fit this format.

  4. For example: There are some problems with failed memory allocations, as the realloc call is not checked for success.

Daniel Schemmel

is currently employed at the Chair of Communication and Distributed Systems at RWTH Aachen University, where he researchs the testability of distributed systems. He can be reached at blog(at)gha.st.

Aachen, Germany, Terra, Sol, Milky Way, Laniakea SC https://gha.st/about/