Introduction
Abstract Data Type
A dictionary, also called a hash, an associative array, a map,
or a hashmap, is a key-value store: it stores values (that can be of
any type) and indexes them using a key (which is in general of a simple
type, such as int or string).
Described abstractly, a dictionary is
- a finite collection of pair of elements (one key, one value),
 - in no particular order,
 - that may contain the same value multiple times (it cannot contain the same key twice, however).
 
Generally, it has operations to…
- … create an empty dictionary,
 - … test for emptiness,
 - … insert or update a key-value pair,
 - … remove a key-value pair,
 - … test for the existence of a key.
 
And, very importantly, it uses
- a hash function, which transforms a key into an 
int(its hash), used as to produce an array index, - a collision resolution strategy, which handles when two different keys have the same hash.
 
Note that the collision resolution strategy is useful only when two different keys have the same hash: a key should always get assigned the same hash, and since a key cannot be part of two different key-value pair, we should not try to resolve this conflict, but instead throw an exception.
Overview
A dictionary organizes the key-value pairs into an array by storing it in its corresponding index, computed using the hash of the key. The main benefit of this approach is that looking if a key-value pair is already in the dictionary is immediate: it suffices to hash the key, and to look at the index obtained if the same key is already stored. The main downside is that multiple (different) keys can be assigned the same hash, and hence the same index: indeed, since the keys that will be used is not known ahead of time, it is possible that different keys are assigned the same index. This is a collision, and there are two main ways of resolving it:
- One can use chaining so that the array actually contains lists of key-value pairs.
 - One case use open addressing so that “the next available index” is used. To determine which “next index” should be used, we will need a probe sequence strategy, i.e., a way of deciding how the next index is computed.
 
Possible Implementation
Preliminaries
Implementing dictionaries requires some preliminaries: prime numbers,
enumerated
types,
and the built-in GetHashCode
method.
We first illustrate prime numbers with the following class, remembering that
A prime number is a natural number greater than 1 that is not a product of two smaller natural numbers.
public static class PrimeHelper
{
  public static bool IsPrime(int n)
  {
    bool returned = true;
    if (n < 2)
      returned = false;
    if (n == 2 || n == 3)
      returned = true;
    if (n % 2 == 0)
      returned = false;
    // We use uint so that we can compute larger primes, since
    // we know that i will not need to be negative.
    uint i = 3;
 
    /*
     * Our goal is to find i > 1 and j > 1 such that
     * i * j = n
     * to prove that n is not prime.
     * Assuming i <= j, it suffices to look "up to" i * i <= n (i.e, so that i is less than √n).
     * We also exit our loop as soon as returned is proved false.
     */
 
    while (i * i <= n && returned)
    {
      if (n % i == 0)
        returned = false;
      i += 2;
    }
    return returned;
  }
 
  public static int NextPrime(int n)
  {
    int returned = n;
    if (returned <= 2)
    {
      // The smallest prime greater than or equal to 2
      // is … 2!
      returned = 2;
    }
    else
    {
      // Since 2 is the only even prime,
      // we make the returned value even
      // if it is divisible by 2.
      if (returned % 2 == 0)
        returned++;
 
      // Then, we simply look for the next Prime value,
      // incrementing two by two.
      while (!IsPrime(returned))
      {
        returned += 2;
      }
    }
    return returned;
  }
}We can then declare a Level enumerated type, and demonstrate how to
use it, our PrimeHelper static class, and a couple of examples using
GetHashCode:
using System;
using System.Collections.Generic;
 
class Program
{
  public enum Level
  {
    Low,
    Medium,
    High,
  }
 
  static void Main(string[] args)
  {
    // Demonstrating enum type.
    Console.WriteLine("Demonstrating enumerated types:");
    Level lvl1 = Level.Medium; // To access the value, we prefix with Level.
    Level lvl2 = (Level)0; // We can cast an int into a Level.
    Console.WriteLine(lvl1);
    Console.WriteLine(lvl2.ToString());
    // Will display "Medium" then "Low": the ToString() method is given by default.
 
    // Demonstrating PrimeHelper class:
    Console.WriteLine("\nDemonstrating PrimeHelper class:");
    // Testing IsPrime
    int[] testingValues =
    {
      89,
      6700417,
      2147483646,
      2147483647,
    };
    // 2147483647 remained the largest known prime until 1867,
    // and is the largest value that a signed 32-bit integer field can hold!
    // cf. https://en.wikipedia.org/wiki/2,147,483,647
    foreach (int i in testingValues)
    {
      Console.WriteLine(
        $"{i:N00} is prime: {PrimeHelper.IsPrime(i)}."
      );
    }
    // Testing NextPrime
    for (int i = 0; i < 10; i++)
    {
      Console.WriteLine(
        "The smallest prime greater than or equal to "
          + i
          + " is "
          + PrimeHelper.NextPrime(i)
          + "."
      );
    }
    Console.WriteLine(
      "The smallest prime greater than or equal to 2,147,483,000 is: "
        + PrimeHelper.NextPrime(2147483000)
    );
 
    // Demonstrating GetHashCode:
    Console.WriteLine(
      "\nDemonstrating GetHashCode method:"
    );
    Console.WriteLine(
      "The hash code of an empty array of 12 int is: "
        + (new int[12]).GetHashCode() // 156563611
        + "."
    );
    Console.WriteLine(
      "The hash code of an empty array of 14 string is: "
        + (new string[14]).GetHashCode() // -800837957
        + "."
    );
    Console.WriteLine(
      "The hash code of \"test string\" is: "
        + "test string".GetHashCode() // 1040790544
        + "."
    );
    Console.WriteLine(
      "The hash code of 12 is: " + 12.GetHashCode() + "." // 12
    );
  }
}Using Open Addressing
Setting-up
The first step in our definition of dictionary is to require two
generic type parameters: one for the key
(which will generally be simple, such as int or string), and one for
the values. We additional require that the datatype for keys realizes
the IComparable interface:
public class CDictionary<TKey, TValue>
  where TKey : IComparable<TKey>We then define the Cell class, which we will use to store the key and
value. A third attribute of the enumerated datatype StatusType, will
be used to mark if the cell is empty, active or deleted: its purpose
will become clearer later on.
    public enum StatusType
    {
        Empty,
        Active,
        Deleted
    };
 
    private class Cell
    {
        public StatusType Status { get; set; }
        public TValue Value { get; set; }
        public TKey Key { get; set; }
 
        public Cell(
          TKey keyP = default(TKey),
          TValue valueP = default(TValue),
          StatusType statusP = StatusType.Empty
        )
        {
            Key = keyP;
            Value = valueP;
            Status = statusP;
        }
 
        public override string ToString()
        {
            return Key + ":" + Value;
        }
    }Then, a dictionary has only two attributes: an array of Cells, and a
probe sequence strategy of the PSSType.
    public enum PSSType
    {
        Linear,
        Quad,
        Double
    };    private Cell[] table;
    private readonly PSSType Strategy;
 
    public CDictionary(
      int size = 31,
      PSSType probesequenceP = PSSType.Double
    )
    {
        table = new Cell[PrimeHelper.NextPrime(size)];
        Strategy = probesequenceP;
    }The default size of 31 and the reason why we are using the NextPrime
method are discussed further
down.
Computing the index
The next important bit is to decide where a pair key, value will be
stored in the table array. For this, we need to compute a valid index
based on the key, knowing that we may have to solve conflicts. Hence,
computing an index requires to
- start from the hash of the key, as computed by 
GetHashCode, - take its absolute value (since a 
GetHashCodecan return negative values), - “shift” the hash if a collision happened (possibly multiple times), by adding something to it,
 - take the remainder of dividing this number by 
table.Length, to make sure we produce a “valid” index. 
All together, this guarantee that the index we produced is positive, and
less than table.Length.
We obtain the following, where the details of CollisionResolution are
not important at this point: the crucial point is to note that if
countP is 0, then no collision have occurred yet, and we do not shift
the hash.
    public int GetIndex(TKey keyP, int countP)
    {
        // countP captures the number of times we had 
        // to solve a collision.
        return (
            Math.Abs(keyP.GetHashCode())
            + CollisionResolution(keyP, countP)
          ) % table.Length;
 
    }
    // GetIndex is public for demonstration purposes, 
    // but should really be private.
 
    // This is how collisions are handled.
    // It depends on the strategy picked (Strategy),
    // the key (keyP), and the number of time we had
    // to handle a collision (countP).
    private int CollisionResolution(TKey keyP, int countP)
    {
        if (countP == 0)
            return 0;
        else
        {
            if (Strategy == PSSType.Linear)
                return countP++;
            else if (Strategy == PSSType.Quad)
                return countP * countP;
            else if (Strategy == PSSType.Double)
                // This is double hashing.
                return countP * GetHash2(keyP);
            else
                // This is needed to compile:
                // even if we know that those are 
                // the only values in the PSSType 
                // enumerated datatype, C# will
                // complain that not all code path 
                // return a value otherwise.
                throw new ApplicationException(
                  "Unknown collision startegy."
                );
        }
    }Adding an element
Adding an element is a delicate process. We only need a key and a value, and then we
- 
make sure the dictionary does not already contain a key-value pair with the same key (detailed below),
 - 
compute an index, store it into a variable
index, and proceed as follows:As long as the table contains a
Cellatindexwhose status is notDeletednorEmpty, we- Increment the counter 
countthat count the number of times we had to “look for a new index” (i.e., solve a conflict), - Check if 
countreached the size of the array:- if that is the case, throw an exception: we saw as many collisions as there are slots in the array, meaning that the array is full,
 - otherwise, generate another 
index, knowing that we metcountconflicts already. 
 
 - Increment the counter 
 - 
once we have reached this point (i.e., we found a suitable
index), we know thatindexrefers to a place in the array that is either- empty, in which case we can create a 
Cellobject using the parameters, - with a status set to 
deletedorempty, and we can re-use it. 
 - empty, in which case we can create a 
 
    public void Add(TKey keyP, TValue valueP)
    {
        if (Find(keyP))
        {
            throw new ArgumentException(
                              "A value with key "
                                + keyP.ToString()
                                + " is already present."
                            );
        }
        int count = 0;
        int index = GetIndex(keyP, count);
        // Collision resolution
        while (
          table[index] != null
          && table[index].Status == StatusType.Active
        )
        {
            count++;
            if (count == table.Length)
            {
                // If table is full, throw an exception.
                throw new ApplicationException("Table is full.");
            }
            else
            {
                // There is still room, generate the next index.
                index = GetIndex(keyP, count);
            }
        }
 
        if (table[index] == null) // table slot is empty (e.g. never been used)
            table[index] = new Cell(
              keyP,
              valueP,
              StatusType.Active
            );
        else
        {
            table[index].Value = valueP;
            table[index].Key = keyP;
            table[index].Status = StatusType.Active;
        }
    }Finding a key
For find, we use a subroutine FindI that computes the index of a key
if it exists, returns -1 otherwise. The critical point is to understand
that we need to keep looking even if the cell is marked as deleted.
We illustrate this point below.
    public bool Find(TKey keyP)
    {
        bool found = FindI(keyP) != -1;
        return found;
    }
 
    public int FindI(TKey keyP)
    {
        bool found = false;
        int count = 0;
        int index = GetIndex(keyP, count);
        while (table[index] != null && count < table.Length && !found)
        {
            if (table[index].Status == StatusType.Active && table[index].Key.Equals(keyP))
            {
                found = true;
            }
            else if (table[index].Status == StatusType.Empty)
            {
                // We can exit, the value will not be found.
                count = table.Length;
            }
            else
            {
                count++;
                index = GetIndex(keyP, count);
            }
        }
        // Uncomment the following line if you'd like to get a better feel
        // for how this method works.
        // Console.WriteLine("Found " + keyP + " in " + count + ": " + found + ".");
        if (!found) { index = -1; }
        return index;
    }Handling deletion
The Remove method heavily relies on FindI:
    public void Remove(TKey keyP)
    {
        int index = FindI(keyP);
        if (index == -1)
        {
            throw new ApplicationException("Cannot delete missing key.");
        }
        else
        {
            table[index].Status = StatusType.Deleted;
        }
    }The important aspect is to understand why we use the Deleted status
instead of simply removing the Cell. There is one important reason for
that. Imagine the following scenario:
- We want to insert two key-value pairs with keys 
"Mary"and"Lora". BothGetIndex("Mary", 0)andGetIndex("Lora", 0)return 6 (with antableof size 13). - When we insert the value with key 
"Mary", we use the index 6. - When we insert the value with key 
"Lora", we useGetIndex("Lora", 1)to generate a new index (its value will depend on the strategy that was picked, with theLinearmethod we would get 7 if this index is available, otherwise we would have to compute the next available index usingGetIndex("Lora", 2), and so on). - We delete the value with key 
"Mary", and then look for the value with key"Lora". OurFindalgorithm computesGetIndex("Lora", 0), gets 6, look attable[6]. If ourRemovemethod simply deleted the cell containing the key with value"Mary", then ourFindalgorithm would conclude that no value with key"Lora"is in thetable, since the index it computed is unoccupied. 
This is the reason why we need to keep track of the deleted cells, to
make sure Find will keep looking because it knows that possibly, when
the value with key "Lora" was inserted, its index was already taken.
How is the size of the array decided?
The size of the array will in general be a prime number. This is discussed in detail on stackexchange, but can be easily illustrated. Let us assume that our dictionary
- uses an array of size 4 (so, not a prime number),
 - uses the quadratic probe sequence strategy,
 - receives one value with hash 1,
 - then receives two values, both with hash 0.
 
The first value gets inserted at index 0, the second gets inserted at index 1. For the third value:
- The first index that 
Inserttries is 0, but a value is already there. - The second index that 
Inserttries is , but it is taken already. - The third index that 
Inserttries is , which gives … 0 modulo 4, - And then the sequence goes forever: 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, …
 
More generally, the quadratic probe sequence gives,
| Table size | Probe sequence | 
|---|---|
| 12 | 0, 1, 4, 9, 4, 1, 0, 1, 4, 9, … | 
| 13 | 0, 1, 4, 9, 3, 12, 10, 10, 12, 3, … | 
While still not ideal, we can see that using a prime number for the size allows to “break the cyclicity” every now and then and to obtain additional numbers in the sequence: we go from 4 different indices to 7 per 10 indexes.
The default size of 31 is picked for various reasons, some being historical.
As we can see, the quadratic probing strategy has an issue that linear probing does not have: it may “skip” some indices, and incorrectly returns that the array is full while it is not. Why would we chose it, then? Because of clustering.
Clustering
An important goal of dictionaries is to avoid having parts of the array
filled while other parts are left unused, a situation known as
clustering. This is detrimental, because finding an index requires
more and more computation if keys are often given the same or close
indices (i.e., we need to call GetIndex with higher countP values).
This situation will happen if too many keys are given the same hash and index, something that is hard to predict since keys will in general not be uniformly distributed and not known ahead of time. Linear probing is very bad in solving this problem, since the clusters are “spread out continuously”, quadratic probing is an improvement, but only partially solve this issue, since keys with identical hashes will still follow the same sequence. Double hashing is a bit better at solving this problem, since keys with identical hashes may drift apart significantly when the secondary hash function is applied:
    private int GetHash2(TKey key)
    {
        return table.Length - (key.GetHashCode() % table.Length);
    }A second Hash function must never evaluate to zero (otherwise we are just trying the same spot again and again), be as independent from the first hash function as possible, and should help in trying as many slots as possible.
Note that our function never evaluate to zero:
key.GetHashCode() % table.Lengthgives a value betweentable.Lengthandtable.Length(sinceGetHashCodecan return negative values or 0),- so 
table.Length - (key.GetHashCode() % table.Length)gives a value between 1 andtable.Length. 
Our main method includes a test demonstrating the efficiency of our
double hashing techniques:
                    CDictionary<string, string> demoD = new CDictionary<
          string,
          string
        >(1009, CDictionary<string, string>.PSSType.Double);
        bool[] arrayD = new bool[1009];
        for (int i = 0; i < arrayD.Length; i++)
        {
            arrayD[demoD.GetIndex("Test", i)] = true;
            // Uncomment the following if you'd like to see 
            // which indices are hit.
            // Console.WriteLine(i +": " + demoD.GetIndex("Test", i) + ".");
        }
        count = 0;
        for (int i = 0; i < arrayD.Length; i++)
        {
            if (arrayD[i]) count++;
        }
        Console.WriteLine($"We hit {((decimal)count / arrayD.Length):p} of the indices.");While the quadratic method hits about 50% of the indices, the double hashing techniques reach 100%!
This general discussion relates to performance and requires to measure the dictionary’s load factor, which is the number of entries occupied in the hash table divided by the table length (or number of “buckets”). Of course, open-addressed hash table cannot have a load factor greater than 1, but other techniques, such as chaining, allows for larger load factors.
Clearing
Clearing the dictionary set all the Status to Empty:
    public void Clear()
    {
        for (int i = 0; i < table.Length; i++)
            if (table[i] != null)
                table[i].Status = StatusType.Empty;
        // Reuse cells by setting them to Empty
    }We decide to do “object reuse”, or pooling, so that we can re-use the
object instead of deleting it by setting the array to null and
re-creating Cell objects when needed. This may or may not be a
performance gain, based on context, but must be done using Empty
instead of Deleted: otherwise, the FindI method may go through many
Cell that have been cleared instead of deciding more rapidly that a
key is missing. The Add method, however, already handle this case by
having the same behavior for Deleted and Empty status.