All Sorts of Sorting Sorcery...
Part 1: What's This Series All About?
I've been around a while: the predominant gray in my hair is a strong indication that I remember the days of BASIC, when 64k was more than enough memory to run my text-based adventure games and avoid being eaten by a Grue.
It wasn't long after my first Grue encounter that I was also introduced to the concept of various sorting algorithms, and their subsequent "Big O" notation, used to indicate how "efficient" those algorithms were.
For our discussion in this article, I'm going to assume that you're familiar with various sorting algorithms, and that you're also passingly familiar with "Big O" notation.
I remember that early on, the Selection sort seemed to resonate with me. I understood it the quickest, but also quickly learned that it's not very efficient: it has a Big O notation of O(n^2). This basically means that every time you double the size of the data that you're sorting, it takes this algorithm 4 times longer to sort it. (If it sorts a entries in a^2 time, then it sorts 2a entries in (2a)^2 time, which is 4 x a^2.) One conclusion from all this is that you can use a Selection sort as long as your pool of data is fairly small.
Quicksort, on the other hand, was the darling of efficiency. It could reliably sort any pool of data in O(n*log(n)) time, which is as good as it gets with sorting algorithms. This means that doubling the amount of data only causes a slight increase in the amount of time the Quicksort algorithm needs to accomplish its job. But Quicksort is a bit more complex to code, and often takes up more memory than Selection sort when it's executing. If you have a large pool of data, the speed of Quicksort generally far outshines the humble Selection sort.
A Vague Memory
The above facts had been safely tucked away in the attic of my mind for many years. Then I recently happened upon an internet article that took these facts out of the attic, dusted them off, and added some new info of its own.
Try as I might, I don't seem to be able to find that article online any more, but I have a vague memory of it. The article was discussing Quicksort vs. Selection sort, and made the claim that there was a particular threshold value, below which Selection sort was a better choice to sort that number of items. And, of course, above that threshold value, it made more sense to use Quicksort. My vague memory of the value discussed in the article was that it was somewhere between 70 and 90. If that vague memory flushes out any similar recollections from the corners of your mind, I'd love to hear about it.
Where We're Headed
So, over the course of this series of posts, I'd like to accomplish a few things:
- Confirm that the theoretical efficiency values listed above for Selection Sort and Quicksort can be seen in actual practice.
- Attempt to determine empirically the threshold value from my vague memory, and confirm that Selection Sort is more performant below that threshold, and Quicksort above.
- And then, let's see whether the
We'll undertake the first goal above in the next article. Stay tuned...