The perceptron algorithm is a basic machine learning architecture which can classify linearly separable data by learning an accurate model from training data. Understanding the perceptron algorithm is key in understanding neural networks. This article will explain the perceptron algorithm and how it works.
Imagine a university accepts candidates based on their grades and their entrance exam scores, and we want to build a model which will accurately predict whether a candidate is accepted based on past data.
There are two input variables,
To classify the data, let's assign values above 0 as being accepted and values below 0 being rejected. As the data is linearly separable, our target function
can be modelled linearly. This means we can define the following equation for
We can find the decision boundary of the equation, which is where
Despite this problem being trivial when solved graphically, solving the problem programmatically can be more challenging. However, gradient descent can be used to solve this problem. By iteratively improving the initial parameters of the function, gradient descent converges to better parameters which will produce a more accurate model.
Let's initialise our desired function
where H(x) is the heaviside step function.
An output of 1 means these results would be accepted by the current function. As we have labeled data and a current output of the model, we know which direction we want the output of this model to move. The parameters of the function need to be tuned such that the function outputs a lower value. This can be achieved by differentiating the function and finding the rate of change of the output with respect to each parameter, we then know the effect of each parameter on the output and know how much to change them to approach the target. This technique is known as gradient descent.
As the function uses multiple parameters we must use partial differentiation to find the rates of change for each parameter, which is where all other
parameters are treated as constants. The partial derivative is denoted by
Each step of gradient descent involves taking a small step in the direction of a better result.
These steps allow us to optimise our function by shifting the parameters with each incorrect classification over many samples.
We can find our new values for the
If this is repeated over multiple training examples, an optimal decision boundary can be found, where every point in the training data is correctly classified. These steps define the perceptron algorithm, and it is proven that the algorithm will always converge to an optimal solution if the data is linearly separable.
Usually each adjustment is multiplied by some fraction known as the 'learning rate'. This allows the algorithm to converge smoothly, making it easier to find precise parameters.
Through these steps we have implemented a basic machine learning algorithm, allowing the program to solve the problem with no knowledge of the problem domain besides the classification of the training data. The perceptron algorithm is only able to classify linearly separable data, however it is used as a building block in neural networks which can solve much more complex problems.