This library implements a simple hash table that can be used as a dictionary or set implementation in C programs. It uses a "Robin Hood" probing algorithm, which is described here.
This library aims to provide a straightforward hash table implementation that is both readily understandable and suitable for most hash table use cases. Maximum possible performance and scabability are not design goals.
Also, the library provides no explicit support for concurrency. Any use of the library in a multi-threaded application will require external synchronization.
The library takes a "fail fast" approach to logic errors in the calling program. The calling program will be aborted if it violates the library's API contract. (See Abort conditions below.) This frees the calling program from the need to check for and react to unrecoverable programming errors.
This library is developed on Linux with GCC (15.2.1 as of this writing). It makes use of a number of GCC builtins and C23 features. It should also work with Clang.
The library imposes a number of limits.
In addition to the limits imposed by the library, the total size of a table is limited by the amount of memory available to the application. A table's total memory footprint scales with the product of the number of buckets and the entry size (i.e. total_size ∝ buckets × entry_size), so the largest possible table will not fit in the 4 GiB address space of a 32-bit system.
The basic usage pattern is:
To create a hash table, the following are required.
NOTE
The examples in this section omit error checking.
This example shows the simplest possible usage — a set of integers.
Note the following.
This example creates a dictionary that maps host names to IP addresses.
Note:
Some of the library's API functions are intended to be called via helper macros. The names of these functions end in an underscore (_), and their short descriptions in the API documentation are enclosed in parentheses.
Table entries may contain pointers to other objects. (See dict_entry.hostname in the example above.) If such an entry is deleted from a table without freeing these objects, memory leaks can occur. If a table has a free function set, that function will be called whenever an entry is deleted from that table, in order to free that entry's associated resources.
The following operations can cause entries to be deleted.
(Operations such as sht_swap(), that return the entry being removed from the table, will not call the free function.)
Note that entry resources should not be automatically freed if they are still referenced elsewhere. Users of the library must take care to avoid both memory leaks and use-after-free bugs.
The library supports 2 iterator variations — read-only and read/write.
The table below summarizes these differences.
| Read-only | Read/write | |
|---|---|---|
| Multiple iterators? | YES | NO |
| Replace entries? | YES | YES |
| Delete entries? | NO | YES |
Calling a function that may modify the structure of the table while any iterators (read-only or read/write) exist on the table will cause the library to abort the program. (See Abort conditions below.)
NOTE
The order in which an iterator returns the items in the table is effectively random, and it may change as entries are added to and removed from the table.
Some of the functions (and macros) in this library return an error indication in some circumstances.
| Function | Error return | Error info | Error codes |
|---|---|---|---|
| SHT_NEW() | NULL | 1 | SHT_ERR_ALLOC† |
| - sht_new_() | NULL | 1 | SHT_ERR_BAD_ESIZE, SHT_ERR_ALLOC |
| sht_init() | 0 | 2 | SHT_ERR_TOOBIG, SHT_ERR_ALLOC |
| sht_add() | -1 | 2 | SHT_ERR_TOOBIG, SHT_ERR_ALLOC, SHT_ERR_BAD_HASH |
| sht_set() | -1 | 2 | SHT_ERR_TOOBIG, SHT_ERR_ALLOC, SHT_ERR_BAD_HASH |
| sht_ro_iter() | NULL | 2 | SHT_ERR_ITER_LOCK, SHT_ERR_ITER_COUNT, SHT_ERR_ALLOC |
| sht_rw_iter() | NULL | 2 | SHT_ERR_ITER_LOCK, SHT_ERR_ALLOC |
| sht_iter_delete() | 0 | 3 | SHT_ERR_ITER_NO_LAST |
| SHT_ITER_REPLACE() | 0 | 3 | SHT_ERR_ITER_NO_LAST |
| - sht_ro_iter_replace_() | 0 | 3 | SHT_ERR_ITER_NO_LAST |
| - sht_rw_iter_replace_() | 0 | 3 | SHT_ERR_ITER_NO_LAST |
† SHT_NEW() checks the entry size during compilation.
If an error occurs, sht_new_() stores an error code in the variable referenced by its err argument, provided that the value of err is not NULL. A description of the error can be retrieved with sht_msg().
SHT_NEW() can be called with either 3 or 4 arguments. If a fourth argument is provided, it is used as the err argument in the internal call to sht_new_(). Otherwise, sht_new_() is called with its err argument set to NULL.
As mentioned above, the library takes a "fail fast" approach to API contract violations by the calling program. The library will abort() the calling program if any of the following occur.
| Function | Uninitialized | Initialized | Iterator(s) exist |
|---|---|---|---|
| sht_set_hash_ctx() | ABORT | † | |
| sht_set_eq_ctx() | ABORT | † | |
| sht_set_freefn() | ABORT | † | |
| sht_set_lft() | ABORT | † | |
| sht_set_psl_limit() | ABORT | † | |
| sht_init() | ABORT | † | |
| sht_free() | ABORT | ||
| sht_add() | ABORT | ABORT | |
| sht_set() | ABORT | ABORT | |
| sht_get() | ABORT | ||
| sht_size() | ABORT | ||
| sht_empty() | ABORT | ||
| sht_delete() | ABORT | ABORT | |
| sht_pop() | ABORT | ABORT | |
| sht_replace() | ABORT | ||
| sht_swap() | ABORT | ||
| sht_ro_iter() | ABORT | ||
| sht_rw_iter() | ABORT |
† Abort implied. (An iterator cannot be created on an uninitialized table.)