Searching - Algorithms - Computer Science Field Guide

Searching through collections of data is something computers have to do all the time. It happens every time you type in a search on Google, or when you type in a file name to search for on your computer, or when you pay at an EFTPOS machine and the system looks up your account number. Computers deal with such huge amounts of data that we need fast algorithms to help us find information quickly.

Lets investigate searching with a game...

Thumbnail of Searching Algorithms interactive

Searching Boxes - Part 1

You may have noticed that the numbers on the boxes in the game were in a random order, which meant that finding the target number was basically luck! You might have found it on your first try, or if you were less lucky you might have had to look inside almost all the boxes before you found it. This might not seem like such a bad thing since you had enough lives to look under all the boxes, but imagine if there had been 1,000 boxes, or worse 1,000,000! It would have taken far too long to look through all the boxes and the target number might have never been found.

Now this next game is slightly different. You have fewer lives, which makes things a bit more challenging, but this time the numbers inside the boxes will be in order. The box with the smallest number is on the far left, and the one with the largest number is on the far right. Let's see if you can find all the target numbers without running out of lives...

Searching Boxes - Part 2

Now that you have played through the whole game (and hopefully found all of the target numbers!) you may have noticed that even though you had fewer lives in the second part of the game, and lots of boxes to search through, you were still able to find the target number. Why was this possible?

2.2.1.

Linear search

Since the boxes in the first game were in a random order there really wasn't any strategy you could have used to find the target number, except simply keep opening boxes one by one until you found the target number. This is essentially the linear search algorithm (sometimes called a sequential search). In simpler terms, linear search algorithm is as follows:

Check if the first item in a list is the item you are searching for, if it is the one you are looking for, you are done.
If it isn't the item you are searching for move on and check the next item.
Continue checking items until you find the one you are searching for.

If you used this algorithm you might get lucky and find what you are looking for on your first go, but if you were really unlucky you might have to look through everything in your list before you found the right object! For a list of 10 items this means on average you would only have to look at 5 items to find what you were looking for, but for a list of 10,000 you would have to look through on average 5,000.

How is bozo search different from linear search? Curiosity

If you watched the video at the beginning of the chapter you might be thinking that what you did in the box searching game sounds more like bozo search than linear search, but actually bozo search is even sillier than this! If you were doing a bozo search then after opening a box and finding the wrong number, you would close the box back up and try another one at random! This means you might end up checking the same box again and again and again and you might never find the target number, even with a small number of boxes!

2.2.2.

Binary search

A much better algorithm to use is called binary search. In the second part of the box searching game the boxes were in order, which meant you were able to be more clever when you were searching for the target number, and you might have been using a binary search without realising!

If you used a binary search on each of the levels then you would have always had enough lives to find the target number! Informally, the binary search algorithm is as follows:

Look at the item in the centre of the list and compare it to what you are searching for
If it is what you are looking for then you are done.
If it is larger than the item you are looking for then you can ignore all the items in the list that are larger than that item (if the list is from smallest to largest this means you can ignore all the items to the right of the centre item).
If it is smaller then you can ignore all the items in the list that are smaller than that centre item.
Now repeat the algorithm on the remaining half of the list, checking the middle of the list and choosing one of the halves, until you find the item you are searching for.

Binary search is a very powerful algorithm. If you had 1,000 boxes to search through it would take you at most 10 checks for binary search to find something and linear search would take at most 1,000 checks, but if you doubled the number of boxes to search through how would this change the number of checks made by binary search and linear search?

How does doubling the number of boxes affect the number of checks required? Spoiler!

The answer to the above question is that the maximum number of checks for linear search would double, but the maximum number for binary search would only increase by one. This is known as a logarithmic relationship, and you'll learn more about that in the Complexity and Tractability chapter.

It is important to remember that you can only perform a binary search if the items you are searching through are sorted into order. This makes the sorting algorithms we will look at next even more important because without sorting algorithms we wouldn't be able to use binary search to quickly look through data!

Code to run linear and binary search for yourself Project

The following files will run linear and binary search in various languages; you can use them to generate random lists of values and measure how long they take to find a given value. Your project is to measure the amount of time taken as the number of items (n) increases; try drawing a graph showing this.

Algorithms 2.2. Searching

Algorithms

Linear search

Binary search

Algorithms
2.2. Searching