Implementing Binary Search Algorithm in Python

Binary search works based upon the process of eliminating any numbers that are either above an upper bound or below a lower bound that a number cannot be.

Binary search is called binary search because it use the binary number sequence which starts from 1 and keeps on doubling. This uses the logarithmic scale with a base of 2.

In maths, logarithms are the inverse of exponents. 2 to the power of 10 equals 1,024. Have you ever noticed this is how many bytes there are in a megabyte or how many megabytes there are in a gigagyte?

As a further matter of interest to see how these numbers relate to things in computing, 2 to the power of 20 is 1,048,576 which is the number of rows there are in modern versions of Microsoft Excel spreadsheets so it turns out that the number of rows (and columns) wasn't just a number plucked out of the air.

Binary numbers are everywhere in computing.

In this example, I have asked someone to pick a number between 1 and a 1 million. For the purposes of the example, I have plucked 646,235 out of the air as being that chosen number. Since these are language agnostic concepts, I have then written some code in both Python and JavaScript which finds the number of guesses that it would take to arrive at that number using the binary search algorithm. I'll then illustrate the guessing process until the number is arrived at.

Python Implementation

# Calculates the number of `guesses` required to arrive at the `chosen_num`

import array

def binary_search(tmp_array, guess):
    guess_count = 1
    lower_bound = 0
    upper_bound = len(tmp_array)

    while lower_bound <= upper_bound:
        midpoint = (upper_bound + lower_bound) // 2

        if guess < midpoint:
            upper_bound = midpoint - 1
            guess_count += 1
        elif guess > midpoint:
            lower_bound = midpoint + 1
            guess_count += 1
        elif guess == midpoint:
            return guess_count


million_item_array = array.array('i', (i for i in range(1, 1000001)))
chosen_num = 646235

number_of_guesses = binary_search(million_item_array, chosen_num)

print(f'It took {number_of_guesses} guesses to get to the right number.')

 

JavaScript Implementation

// Calculates the number of `guesses` required to arrive at the `chosen_num`

function binarySearch(tmpArray,guess) {
  guessCount = 1;
  lowerBound = 0;
  upperBound = tmpArray.length;

  while (lowerBound <= upperBound) {
    midpoint = Math.floor((upperBound + lowerBound) / 2)

    if (guess < midpoint) {
      upperBound = midpoint - 1
      guessCount += 1
    };
    if (guess > midpoint) {
      lowerBound = midpoint + 1
      guessCount += 1
    };
    if (guess == midpoint) {
      return guessCount
    };
  }
}

millionItemArray = []
for (let i = 1; i <= 1000001; i++) {
  millionItemArray.push(i)
}
chosenNum = 646235

numberOfGuesses = binarySearch(millionItemArray, chosenNum)

console.log(`It took ${numberOfGuesses} guesses to get to the right number.`);

Let's look at how the algorithm simulates the guessing process undertaken by a smart human in this game. The smart human chooses the quotient portion of the midpoint between the lower and upper bound on each guess they have. Again, the target is 646,235.

Guess Lower Bound Upper Bound Midpoint Chosen Num Action
1 0 1,000,000 500,000 > Midpoint Eliminate nums <= midpoint
2 500,001 1,000,000 750,000 < Midpoint Eliminate nums >= midpoint
3 500,001 749,999 625,000 > Midpoint Eliminate nums <= midpoint
4 625,001 749,999 687,500 < Midpoint Eliminate nums >= midpoint
5 625,001 687,499 656,250 < Midpoint Eliminate nums >= midpoint
6 625,001 656,249 640,625 > Midpoint Eliminate nums <= midpoint
7 640,626 656,249 648,437 < Midpoint Eliminate nums >= midpoint
8 640,626 648,436 644,531 > Midpoint Eliminate nums <= midpoint
9 644,532 648,436 646,484 > Midpoint Eliminate nums <= midpoint
10 644,532 646,483 645,507 < Midpoint Eliminate nums >= midpoint
11 645,508 646,483 645,995 > Midpoint Eliminate nums <= midpoint
12 645,996 646,483 646,239 < Midpoint Eliminate nums >= midpoint
13 645,996 646,238 646,117 > Midpoint Eliminate nums <= midpoint
14 646,118 646,238 646,178 > Midpoint Eluminate nums <= midpoint
15 646,179 646,238 646,208 > Midpoint Eliminate nums <= midpoint
16 646,209 646,238 646,223 > Midpoint Eliminate nums <= midpoint
17 646,224 646,238 646,231 > Midpoint Eliminate nums <= midpoint
18 646,232 646,238 646,235 = Midpoint Found num on 18th guess!

 

At step 12, their new guess is so close and they don't even know it!

My computer was able to solve it a bit quicker than me. It took me about an hour to write this blog post and my computer did it in 0.162 seconds. I clearly must try harder!

So, how efficient is this algorithm?

Simply put, the greater the distance between the upper and lower bounds as a starting point, then the more efficient the algorithm becomes since many numbers can be eliminated within the first few guesses of the game (iterations of the while loop in the code). If you studied the numbers above, you will have noticed that more possible chosen numbers are eliminated at the start. In the example above, the first guess eliminates 500,000 possible numbers and the final guess only eliminated 7 numbers

The algorithm is said to be an O(log n) algorithm where the logarithm has a base of 2.

In mathematics, the logarithm of 626,235 with respect to the base of 2 is 19.26 (2d.p.). This can be calculated in a Python interpreter session as:

>>> import math
>>> ans = math.log(626235,2)
>>> ans
19.25634461676097
If you wanted to guesstimate how many guesses (iterations) it would take for someone (or the algorithm) to arrive at the chosen number, then this is the calculation that you would undertake.
 

Resources

Although not specifically about the binary search algorithm, for a really good half an hour talk on algorithms and Big-O notation, then I encourage you to check out the following talk by Ned Batcheldor at PyCon US 2018.

Big-O: How Code Slows as Data Grows