Speed Testing - A Beginner's Primer

For my final project at Flatiron, I scraped NFL scores off of ESPN's website, basically by downloading the entire html of the webpages I needed.  This meant that I had a lot of text to search through to find the scores that I needed - more than 20,000+ characters - and I needed to do this up to 32 times to find each team's score.  My initial attempt to parse the html involved converting the html to a string and using .slice to create a bunch of smaller strings that I could pick through looking for markups that indicated a team score was nearby.  This was slow - it took up to 20 seconds to find everything which was unacceptably slow, especially given that this one merely one step in getting users their score data.

My solution?

Using Ruby's .index command to find the markups I was looking for, and only then slicing off the data I needed.  Using this method, I reduced the time to generate my score data from 20 seconds to less than 1.



I've always been interested in seeing how fast I could make code go ever since I found that my classmates and I could run speed tests to compare our different solutions to various whiteboard problems.  The differences usually only amounted to a couple hundred milliseconds at most, though, so the importance of writing speedy code never seemed real to me.  Once I ran my NFL score scraping code, though, I had an actual concrete example of the value - cementing my passion for speedy bytes.

Speed testing has some pitfalls though - it's really easy to get incorrect results when you're testing competing ways of coding a problem or feature.  Here are my biggest tips: 

1.  Make sure to run an adequate number of tests

The internal data that commands like Date.now() return are not completely accurate - they can often be off by 15 milliseconds or so.  If you're comparing code that takes only 3 or 4 milliseconds to run, the runtime that your computer calculates is going to be much more influenced by that measurement error than the actual runtime of your code.  How do you fix this?  Run your code again and again and again.  Here's how I compared two different profit maximization algorithms:


Notice that I ran each function 100,000,000 times for three different inputs, meaning that I ran each function three hundred million times.  The difference in speed between those functions?  Less than 700 milliseconds.  Running each code even a couple thousand times wouldn't be enough to truly determine which one was faster.

2. Make sure that your computer is only working on one thing at once

While we like to think of code as living in some beautiful, mathematical ideal world of logic, in reality it runs on your computer.  In meatspace, sometimes objects have different properties from one moment to the next.  And if your computer is streaming a youtube video while running one bit of code and not another, that's going to slow down your code.  A fair test is a test where your computer is not multitasking.  You might even want to run your speedtest a couple different times just to make sure you don't have background processes messing with your results.

3. Test in different environments (bonus points)

Different code runs different in different environments - sometimes a function runs quickly in Chrome but slowly in Safari.  If you don't test it out in all sorts of different environments, you may inadvertently have optimized your code for an environment your users don't even use.



Some general tips that can help speed up your programs:

A. Use hashes, sets, and maps as opposed to arrays - While it's true that the Big O notational time of a program doesn't *always* predict whether code A will be faster than code B, let's be real here: hashes, sets, and maps are O(1).  There's very few situations where searching through an array is going to be faster than looking up a value in a hash.

B. Avoid converting inputs from one data type to another - Less work for your computer means faster running time.

C. Look for early exits - If there are edge cases (or even regular cases) where your program can finish up early, give it the ability to do so.  For instance, if you have a function that returns a false value if your input array is empty, don't waste your computer's time performing all sorts of calculations on that array - stick an if-function in there and tell it to get the hell out of dodge and return false if it's fed that empty array.


Good luck out there, racers!

Comments

Popular posts from this blog

The Sorting Hat

Kadane's Algorithm

Loose Equality, the Null Operator, and Other Oddities