Motivation
Wikipedia explains it very nicely: sorting is ubiquitous in Computer Sciences. It is a simple problem (“How can I sort the following values the most efficiently?”) that has many solutions, but still offers open problems.
We only consider correct algorithms, i.e., one where their output is such that
- each element is larger than or equal to the previous one, according to the order picked,
- all the elements that were in the input are present in the output (with the same cardinality, if repetition is allowed).
There are many ways of “comparing” sorting algorithms. A sorting algorithm…
- has a best, worst and average case time complexity (measured in general in number of comparisons required),
- has a best, worst and average case space complexity (i.e., “how much additional memory is required?”),
- can be “stable” (i.e., equal values are not permutted),
- uses insertion, exchange, selection, merging method,
- is serial or parallel,
among other properties.
Algorithm Comparison
The algorithms we are about to study compare as follows, omitting the , , and letting be the size of the list we have to sort:
| Name | Best | Average | Worst | Memory | Stable |
|---|---|---|---|---|---|
| Insertion sort | Yes | ||||
| Heapsort | No | ||||
| Bubble sort | Yes | ||||
| Shellsort | No | ||||
| Quicksort | No | ||||
| Selection sort | No | ||||
| Merge sort | Yes |
All the algorithms above are in-place except for merge sort.
Helper Methods
We start by defining two simple “helper” methods:
private static void Swap(List<T> listP, int lhs, int rhs)
{
T current = listP[lhs];
listP[lhs] = listP[rhs];
listP[rhs] = current;
}
public static bool IsSorted(List<T> listP)
{
bool isSorted = true;
for (int i = 0; i < listP.Count - 1; i++)
{
if (listP[i].CompareTo(listP[i + 1]) > 0)
{
isSorted = false;
}
}
return isSorted;
}Insertion Sort Algorithm
Implementation
This algorithm is nicely explained and illustrated on wikipedia, and can be implemented as follows:
public static void InsertionSort(List<T> listP)
{
T current;
int slot;
for (int bar = 1; bar < listP.Count; bar++)
{
current = listP[bar];
for (
slot = bar;
slot > 0 && current.CompareTo(listP[slot - 1]) < 0;
slot--
)
{
listP[slot] = listP[slot - 1];
}
listP[slot] = current;
}
}Description
This algorithm moves the bar from the beginning of the list to the
end, one by one. At every step, it position a slot on the bar and look
back, moving the value at the slot earlier and earlier on as long as
its predecessor is smaller than itself.
Complexity
As explained on
wikipedia,
the simplest worst case input is an array sorted in reverse order.
With an array sorted in reverse order, every iteration of the inner loop
will scan and shift the entire sorted subsection of the array (i.e.,
from bar to the beginning) before inserting the next element. This
gives a quadratic running time (i.e., ), since bar is linear
in n, and we iterate twice over it.
On the flip side, if the array is already sorted, then the algorithm is linear, since the inner loop will always execute just one time, giving an overall best performance of .
But on average, the algorithm remains in since it will need to go through the list twice.
Heapsort Algorithm
Implementation
We first define a helper method:
private static int LeftChild(int i)
{
return 2 * i + 1;
}and then leverage the heap structure to sort:
public static void Heapsort(List<T> listP)
{
// Step 1: heapify, or build heap.
for (int i = listP.Count / 2; i >= 0; i--)
PercDown(listP, i, listP.Count);
// Step 2: recursively extract the largest value.
for (int i = listP.Count - 1; i > 0; i--)
{
Swap(listP, 0, i);
PercDown(listP, 0, i);
}
}
private static void PercDown(List<T> listP, int i, int N)
{
int Child;
T current;
for (current = listP[i]; LeftChild(i) < N; i = Child)
{
Child = LeftChild(i);
if (
Child != N - 1
&& listP[Child].CompareTo(listP[Child + 1]) < 0
)
Child++;
if (current.CompareTo(listP[Child]) < 0) // current < listP[child]
listP[i] = listP[Child];
else
// if current >= listP[child] we *do not*
// swap: we are constructing a *max* heap!
break;
}
listP[i] = current;
}Note that PercDown builds a max heap: once the values are
“pre-sorted greater value first”, removing the first one to move it
to the end of the list makes the list sorted from smallest to greatest
value once we are done.
Description
This algorithm works in two steps:
- First, it constructs the heap “in-place”, by arranging the elements
from the last-to-lower level (
listP.Count / 2corresponds to “the last parent”) to the first level (i >=0), in increasing order (i.e., this is a max heap, the greater value is first), - Then, it recursively extract the first element, and place it after the
end of the heap: note that
PercDown(listP, 0, i)makes it so that the heap is considered to stop at indexi, as it decreases in size by 1 at every iteration.
Complexity
- The
PercDownmethod has complexity , since it iterates through the tree from top to bottom, i.e., it is relative to the tree height, which is . - The first step calls
PercDowntimes, which is equivalent to , so overall this first step is . - The second step also calls
PercDowntimes, so it is overall as well.
Hence, the complexity of heapsort is by the sum rule. Note that the average, worst and best complexity are all the same!
Bubble Algorithm
Implementation
public static void BubbleSort(List<T> listP)
{
for (int i = listP.Count - 1; i >= 0; i--)
{
for (int j = 0; j < i; j++)
{
if (listP[j].CompareTo(listP[j + 1]) > 0)
Swap(listP, j, j + 1);
}
}
}Description
The nested loop accomplishes the following: “from the beginning of the list to where I stopped the last time -1, go through the elements one by one and swap them if needed”.
Complexity
Since both loops depends on the size of the list, , the algorithm is on average : we need to perform times operations. An optimization (not presented here) that stops the inner loop when elements were not swapped allows to bring the best case performance of bubble sort to linear ().
ShellSort Algorithm
Implementation
public static void ShellSort(List<T> listP)
{
int slot;
T current;
for (int gap = listP.Count / 3 + 1; gap > 0; gap /= 2)
{
for (int next = gap; next < listP.Count; next++)
{
// goes thru array by steps
current = listP[next];
for (
slot = next;
slot >= gap
&& current.CompareTo(listP[slot - gap]) < 0;
slot -= gap
)
// slides current until in place
{
listP[slot] = listP[slot - gap];
}
listP[slot] = current;
}
}
}Description
Consider a list of size 30, we have (assuming
current.CompareTo(listP[slot - gap]) < 0 is always true):
gap | next | slot | slot - gap |
|---|---|---|---|
| 11 | 11 | 11 | 0 |
| ” | 12 | 12 | 1 |
| ” | 13 | 13 | 2 |
| … | … | … | … |
| “ | 22 | 22 | 11 |
| " | " | 11 | 0 |
| ” | 23 | 23 | 12 |
| " | " | 12 | 1 |
| … | … | … | … |
| 11 | 30 | 30 | 19 |
| " | " | 19 | 8 |
| 5 | 5 | 5 | 0 |
| " | " | 6 | 1 |
| " | " | 7 | 2 |
| … | … | … | … |
| “ | 10 | 10 | 5 |
| ” | 10 | 5 | 0 |
| ” | 11 | 11 | 6 |
| " | " | 6 | 1 |
| … | … | … | … |
| 2 | |||
| … | … | … | |
| 1 |
The important point is to understand that we generate the sequences of
pairs (slot, slot-gap) as follows:
- Gap of 11: (11, 0), (12, 1), (13, 2), … (22, 11), (11, 0), (23, 12), (12, 1), (30, 19), (19, 8),
- Gap of 5: (5, 0), (11, 6), (6, 1), …
which are sequences of values we are comparing. For the gap of 11, it means we do the following:
- First, we compare the values at indices 11 and 0, and swap them if needed,
- Then, we compare the values at indices 12 and 1, and swap them if needed,
- …
- Then, we compare the values at indices 30 and 19, and swap them if needed,
- If we did swap the values previously, then we compare the values at indices 19 and 8, and swap them if needed.
After we are done going through “the gap”, we know that all values indices apart are sorted. Reducing the value of to makes it so that the whole array is sorted.
Complexity
The complexity of shell sort depends with the “gap sequence” that is
used. We use listP.Count / 3 + 1, (listP.Count / 3 + 1) / 2,
(listP.Count / 3 + 1) / 4, …, 1. This sequence follows Shell’s
original algorithm, and it is of complexity in the worst
case: indeed, we may need to explore gaps, each requiring
swaps. If the best case, if the array is already mostly
sorted, then we still need to explore gaps, but each gap takes
only swaps, giving a complexity. On
average, the complexity depends a lot on the sequence, but can be
around , which is still better than quadratic!
Playing with the gap sequence further can give a best, worst and average performance of !
Quick Sort Algorithm
Implementation
public static void QuickSort(
List<T> listP
)
{
QuickSort(listP, 0, listP.Count - 1);
}
public static void QuickSort(
List<T> listP,
int leftP,
int rightP
)
{
if (leftP < rightP + 1)
{
T pivot = MedianOfThree(listP, leftP, rightP);
int left = leftP;
int right = rightP;
while (left <= right)
{
// looking for value larger
// than the pivot
// on the left:
while (listP[left].CompareTo(pivot) < 0) left++;
// looking for value smaller
// than or equal to the pivot
// on the right, without "crossing"
// left and right.
while ((left <= right) && pivot.CompareTo(listP[right]) <= 0) right--;
if (left < right)
Swap(listP, left, right);
}
Swap(listP, left, rightP); // Move pivot back
QuickSort(listP, leftP, left - 1); // sort left sub-list
QuickSort(listP, left + 1, rightP); // sort rigth sub-list
}
}
private static T MedianOfThree(
List<T> listP,
int left,
int right
)
{
int center = (left + right) / 2;
// We sort the left, center and right
// elements:
if (listP[center].CompareTo(listP[left]) < 0)
Swap(listP, left, center);
if (listP[right].CompareTo(listP[left]) < 0)
Swap(listP, left, right);
if (listP[right].CompareTo(listP[center]) < 0)
Swap(listP, center, right);
// Move the pivot to the right:
Swap(listP, center, right);
return listP[right];
}Description
At a high level, the algorithm
- pick a “median” value (the pivot), at the middle of the list to be sorted,
- organize the list so that the values to the left of the pivot are smaller than the pivot, and the values greater than or equal to the pivot are to its right,
- then recursively call quicksort on the list on the left of the pivot, and on the list on the right of the pivot,
- when the lists left to be sorted are of size 1, we know that quicksort is done (a list of size 1 is sorted!), the list is sorted.
In detail, this algorithm works as follows:
-
Choose a pivot: we use the
medianOfThreemethod to select an element from the list as the pivot. Other ways of choosing the pivot exist, this “median-of-three” technique is optimal when no information about the ordering of the input is known. Note that this method actually sorts those three elements (at the beginning, center and end of the list to be sorted), and take the median one as thepivot, -
Partition the list: the
while (left <= right)loop inQuickSortproceeds as follows:- It first look for the smallest
leftindex such thatlistP[left] > pivot, - It then look for the greatest
rightindex such thatpivot <= listP[right]andleft <= right, - At this point, we know that those values at
leftandrightneeds to be swapped (ifleft < right, i.e., they did not cross) and we swap them.
- It first look for the smallest
-
When this loop is done, we know that
leftis where thepivotought to be, it’s the median value of the list we are sorting. -
We can then call
Quicksorton the two sub-lists:- the one that goes from
leftPtoleft - 1(i.e, the values to the left of the pivot), - the one that goes from
left + 1torightP(i.e, the values to the left of the pivot),
- the one that goes from
One subtlety is that we know that left is where the pivot value
ought to be (we actually did not know where it should have been where we
started, we simply made an educated guess): hence, it was swapped at the
end of medianOfThree and back at the end of Quicksort.
Complexity
The complexity of quick sort depends on how “lucky” we were when we picked the pivot value. The best case is when the pivot always divides the array in two equal halves. Let be the complexity of sorting a list of size using quicksort:
- We need to perform a linear () number of comparison, to partition the values to the left and the right of the pivot, and then to sort each-sublist,
- To sort each sub-list, we need to first partition them, which takes number of comparison,
- Then, sorting each sub-list takes .
- So, we get that ,
- Iterating this mechanism, we get that , for ,
- Then, we get and .
The average case is also , but if we are unlucky with our pivot (which is always at the beginning or at the end of the list), then we have to keep on sorting a sub-list that is linear (we partition our list in lists of size and ), and we get a worst case complexity of .