Searching through collections of data is something computers have to do all the time.
It happens every time you type in a search on Google, or when you type in a file name to search for on your computer.
Computers deal with such huge amounts of data that we need fast algorithms to help us find information quickly.
Lets investigate searching with a game...
You may have noticed that the numbers on the boxes in the game were in a random order, which meant that finding the target number was basically luck!
You might have found it on your first try, or if you were less lucky you might have had to look inside almost all the boxes before you found it.
This might not seem like such a bad thing since you had enough lives to look under all the boxes, but imagine if there had been 1,000 boxes, or worse 1,000,000!
It would have taken far too long to look through all the boxes and the target number might have never been found.
Now this next game is slightly different.
You have fewer lives, which makes things a bit more challenging, but this time the numbers inside the boxes will be in order.
The box with the smallest number is on the far left, and the one with the largest number is on the far right.
Let's see if you can find all the target numbers without running out of lives...
Now that you have played through the whole game (and hopefully found all of the target numbers!) you may have noticed that even though you had less lives in the second part of the game, and lots of boxes to search through, you were still able to find the target number. Why was this possible?
Since the boxes in the first game were in a random order there really wasn't any strategy you could have used to find the target number, except simply keep opening boxes one by one until you found the target number.
This is essentially the linear search algorithm (sometimes called a sequential search).
In simpler terms, linear search algorithm is as follows:
Check if the first item in a list is the item you are searching for, if it is the one you are looking for, you are done.
If it isn't the item you are searching for move on and check the next item.
Continue checking items until you find the one you are searching for.
If you used this algorithm you might get lucky and find what you are looking for on your first go, but if you were really unlucky you might have to look through everything in your list before you found the right object!
For a list of 10 items this means on average you would only have to look at 5 items to find what you were looking for, but for a list of 10000 you would have to look through on average 5000.
How is bozo search different from linear search?
If you watched the video at the beginning of the chapter you might be thinking that what you did in the box searching game sounds more like bozo search than linear search, but actually bozo search is even sillier than this!
If you were doing a bozo search then after opening a box and finding the wrong number, you would close the box back up and try another one at random!
This means you might end up checking the same box again and again and again and you might never find the target number, even with a small number of boxes!
A much better algorithm to use is called binary search.
In the second part of the box searching game the boxes were in order, which meant you were able to be more clever when you were searching for the target number, and you might have been using a binary search without realising!
If you used a binary search on each of the levels then you would have always had enough lives to find the target number!
Informally, the binary search algorithm is as follows:
Look at the item in the centre of the list and compare it to what you are searching for
If it is what you are looking for then you are done.
If it is larger than the item you are looking for then you can ignore all the items in the list which are larger than that item (if the list is from smallest to largest this means you can ignore all the items to the right of the centre item).
If it is smaller then you can ignore all the items in the list which are smaller than that centre item.
Now repeat the algorithm on the remaining half of the list, checking the middle of the list and choosing one of the halves, until you find the item you are searching for.
Binary search is a very powerful algorithm.
If you had 1000 boxes to search through it would take you at most 10 checks for binary search to find something and linear search would take at most 1000 checks, but if you doubled the number of boxes to search through how would this change the number of checks made by binary search and linear search?
How does doubling the number of boxes affect the number of checks required?
The answer to the above question is that the maximum number of checks for linear search would double, but the maximum number for binary search would only increase by one.
It is important to remember that you can only perform a binary search if the items you are searching through are sorted into order.
This makes the sorting algorithms we will look at next even more important because without sorting algorithms we wouldn't be able to use binary search to quickly look through data!
Code to run linear and binary search for yourself
The following files will run linear and binary search in various languages; you can use them to generate random lists of values and measure how long they take to find a given value.
Your project is to measure the amount of time taken as the number of items (n) increases; try drawing a graph showing this.
A Scratch implementation of linear and binary search can be downloaded below.
The following Python implementations of linear and binary search can be run in your browser: