The Sorting Hat

Here's a lesson about the difference between theory and practical applications.


Everyone knows about bubble Sort and merge Sort - they're probably the first two sorting algorithms you learned.  Everyone also knows that (1) bubble sort sucks and (2) this is due to the fact that its average time complexity is O(n^2), aka the worst time complexity.  It even generally performs worse than things like insertion sort and selection sort, the "a D- is still a passing grade" of sorting algorithms.


While all these things are true, there's a more important truth: always test your code.


While reading The Imposter's Handbook by Rob Conery the other day and flipping through the comparison sort algorithms he provides, I decided, Why not actually test out this code?  And, the speed test demon that I am, my test involved sorting out his [23,4,42,15,16,8] array one million times with bubble sort and with merge sort, and I got this result:


Well, that's funny.

I checked through my implementations of the algorithms and they seemed fine.  I checked to make sure that I wasn't testing, for instance, merge one million times and bubble only a hundred thousand times.  I ran the tests a couple more times and got basically the same results, minus a millisecond or two here or there.

Well, I couldn't leave it there...in past experiments I've discovered that algorithms with bad time complexity can perform as fast or faster than "better" algorithms, but only with small data inputs.  As you might expect, if you have bad time complexity, your speed slows down worse and worse the larger the number of inputs.  So I made this testing monstrosity to try to find out where bubble sort's bad time complexity overtakes whatever issues are plaguing this particular implementation of merge sort in Javascript:

function bubbleSort (list) {
let repeat = false
let limit = list.length
let defaultNext = Number.POSITIVE_INFINITY;

for (let i=0;i<limit;i++) {
let thisNumber = list[i]
let nextNumber = i + 1 < limit ? list[i+1] : defaultNext

if (nextNumber < thisNumber) {
list[i] = nextNumber
list[i+1] = thisNumber
repeat = true
}
}

if (repeat) {
return bubbleSort(list)
} else {
return list
}
}

function mergeSort (list) {
if (list.length <= 1) {
return list
}

let middle = list.length/2
let left = list.slice(0, middle)
let right = list.slice(middle, list.length)

return merge(mergeSort(left), mergeSort(right))
}

function merge (left, right) {
let result = []

while (left.length || right.length) {
if (left.length && right.length) {
if (left[0] < right[0]) {
result.push(left.shift())
} else {
result.push(right.shift())
}
} else if (left.length) {
result.push(left.shift())
} else {
result.push(right.shift())
}
}
return result
}

let arrayLength = [2,3,4,6,8,12,16,32,64,128]

for (let x=0;x<arrayLength.length;x++) {

//creates a reference array of increasing digits
let numsOrig = []
for (let i=0; i<arrayLength[x];i++) {
numsOrig.push(i)
}
let listOfLists = []
//creates 100,000 randomized test arrays to sort
let testArrays = 100000
for (let i=0; i<testArrays; i++) {
let list = []
let nums = [...numsOrig]
while (nums.length > 0) {
let rand = Math.floor(nums.length*Math.random())
list.push(nums[rand])
nums.splice(rand,1)
}
listOfLists.push(list)
}
console.log("=====")
start = Date.now()
for (u=0;u<testArrays;u++) {
let testList = listOfLists[u]
mergeSort(testList)
}
end = Date.now()
timeElapsedMerge = end - start
console.log(`merge time for array of length ${arrayLength[x]}: ${timeElapsedMerge} ms`)
console.log("=====")
start = Date.now()
for (u=0;u<testArrays;u++) {
let testList = listOfLists[u]
bubbleSort(testList)
}
end = Date.now()
timeElapsedBubble = end - start
console.log(`bubble time for array of length ${arrayLength[x]}: ${timeElapsedBubble} ms`)

console.log(`RATIO: ${timeElapsedMerge/timeElapsedBubble}`)
}

Woof!  That's a lot of code!  What it does is does a hundred thousand tests for bubble sort and merge sort for ever-increasing array lengths and reports back the times as well as the speed ratio between bubble sort and merge sort (a ratio > 1 means bubble sort did better, a ratio < 1 means merge was better).  Here's my results for the first couple of tests I did, up to an array of length six like I had initially tested:



You can see that bubble sort performs better for each of these tests, though there is a weird tightening at length 3 which I still haven't figured out a theoretical reason for.  Something about how the sorts work for nearly-trivial sorting.

But, as hypothesized, you can see the speed arc bend for our RATIO as the array sizes increase:


Oh man, that ratio is so close! Can merge sort take the lead?


Wooo!  Our rightful victor merge sort takes the lead at an array of length 2^7!  But even here, the victory is hollow - we've sacrificed a lot of space complexity to get a mere 23.3% speed advantage.  As the array size increases though - try it yourself! - you'll see merge sort become better and better.

So, in conclusion: theory should always guide you - but make sure you don't get trapped by theory.  The vagaries of particular code (is there a way to do merge sort without calling the merge function like above?) particular languages, particular machines, and especially particular problems with their particular inputs and outputs can mess with our expectations.  Also, be kind to bubble sort, it might be better than you think.

 




Comments

Popular posts from this blog

Kadane's Algorithm

Loose Equality, the Null Operator, and Other Oddities