Kadane's Algorithm

Imagine it is 1977.  You do not have a degree in computer science.  You have not spent months grinding leetcode.  You are given a algorithm problem that a decorated mathematician has already created what looks like an optimal solution.

And then in less than a minute you come up with a better answer.

You are Jay Kadane, and the problem that you have solved is the maximum subarray problem.

The maximum subarray problem gives you an array of numbers and asks you to pick out the subarray with the largest sum.  The problem is trivial if all the numbers are positive:

[1, 5, 1, 3, 2] <- the largest subarray is obviously the entire array

But what if you've got some negative numbers thrown in there?

[1, 5, -1, 3, -2] <- sure, you don't want the negatives. But isn't taking the -1 worth the +3?

A brute-force solution materializes pretty easily: just create every subarray and find the biggest sum:

[1] = 1
[1, 5] = 6
[1, 5, -1] = 5
[1, 5, -1, 3] = 8
[1, 5, -1, 3, -2] = 6
[5] = 5
[5, -1] = 4
[5, -1, 3] = 7
[5, -1, 3, -2] = 5
[-1] = -1
[-1, 3] = 2
[-1, 3, -2] = 0
[3] = 3
[3, -2] = 1
[2] = 2

The answer is [1, 5, -1, 3]! Easy!

Well, easy enough for a small array.  Since there were 5 elements in the full array, that meant that there were 5 possible subarrays that started with the first element (1). And 4 subarrays for the next element (5). 3 for the next, 2, and 1.  If you play around with a couple arrays of different sizes or just plain recognize the pattern, you'll see that the total number of possible subarrays for an array of length n is always the nth triangular number.  And the nth triangular number is calculated as n*(n+1)/2, which involves squaring n...which gives us a big O time of n^2.  Then we need to find the biggest sum out of those n*(n+1)/2 subarrays, which tacks on another n, leaving us with a fairly poor final big O of O(n^3).

Oof.

Kadane's solution is much easier, and much faster - in fact, it's optimally fast.

What we do in Kadane's algorithm is we continuously calculate the value of subarrays as we go through the whole array:

function maxSubarrayKadane(nums) {
currentSum = 0
maxSum = 0
startIndex = 0
endIndex = 0
bestStart = 0
bestEnd = 0

for (let i=0; i< nums.length; i++) {
if (currentSum <=0) {
startIndex = i
endIndex = i
currentSum = nums[i]
} else {
currentSum += nums[i]
endIndex = i
}
if (currentSum > maxSum) {
maxSum = currentSum
bestStart = startIndex
bestEnd = endIndex + 1
}
}
return nums.slice(bestStart, bestEnd)
};

console.log(maxSubarray([-2,1,-3,3,-2,4,1,-5,4]))
//stick a larger number after the max subarray and see if it gets picked up
console.log(maxSubarray([-2,1,-3,3,-2,4,1,-5,40]))
//stick the large number before and see if it gets noticed
console.log(maxSubarray([-2,40,-3,3,-2,4,1,-5,4]))
//the rest of the subarray doesn't outweigh the -30
console.log(maxSubarray([-2,40,-30,3,-2,4,1,-5,4]))

This seems like magic! How does it work?  There's actually some simple logic behind the whole thing.  First, we know that we want to get rid of negative numbers, but not when it comes at the cost of a bigger number after them.  So we have two scenarios: One, our maximum subarray has no negative numbers.  The algorithm above will find the beginning of that subarray since it's either at the beginning of the array or it comes after a negative number that cancels out the previous numbers in whatever subarray we were working with before.  And then it will find the end, since maxSum updates when currentSum gets bigger.

Or two, our maximum subarray has negative numbers.  If it does, we can note that the subarray will be composed of smaller subarrays*, each with a positive sum, separated by those negative numbers. Our maximum subarray in our first console.log from above is this:


[ 3, -2, 4, 1 ]


We can see two smaller arrays ( ] and 41 ]) that each have a positive sum.  And there's a -2 after the 3, which is worth keeping just so that we can combine the 3 and the [4, 1].  Basically, we want to stitch together as many positive arrays as possible, just so long as they're not outweighed by the negative numbers in-between.  And if we do run into a negative that outweighs the previous sum, we start over because it's not worth keeping the previous subarray.

So, instead of summing every subarray possible, we only need to find positive subarrays, and then we can smoosh them together if it's worth it.  And since we can figure out if it's worth it simply by keeping a running total of the value of whatever subarray we're working with, this can be done as we go through the array without requiring any combinatorics.

Let's run the algorithm through the array to see how this works:

[-2] <- first number we encounter is negative. No good.
[1] <- We have a positive number! This is our current largest subarray.
[1, -3] <- The -3 outweighs the 1. We will discard this array.
[3] <- Aha! A positive number larger than our previous sum. The currentSum > maxSum logic
marches into action and we restart our tracking.
[3, -2] <- A smaller sum, but we don't have a negative number yet...we want to keep this
array around to tack on to the beginning of any positive arrays we encounter
later, since that will make a bigger sum.
[3, -2, 4] <- It pays off! Our maximum sum is now 5, 2 bigger than the sum for [3].
[3, -2, 4, 1] <- Again we have a bigger currentSum than maxSum and we note the
larger array.
[3, -2, 4, 1, -5] <- Our current sum is now 3. No resetting of maxSum.
[3, -2, 4, 1, -5, 4] <- We reach the end. The 4 is not enough to outweigh the 5 and we do
not note a new end for the subarray.

And thus we have reached the end, having touched each element in the array only once, for a total of O(n) time. Neat!

*Unless the entire array is just negative numbers, in which case our largest subarray will be empty.  In the function we have here, since currentSum will never be bigger than 0, our start and end indices will never update and our return will just give us an empty set.

Comments

Popular posts from this blog

The Sorting Hat

Loose Equality, the Null Operator, and Other Oddities