Machine Learning: Chapter 7 - “Deep” Neural Networks and “Hidden layers”
In chapter three we introduced the idea of constructing models using neural networks. This post is going to expand upon this idea. By the end of this post my goal is for you to have a better understanding of what exactly is meant by the term “Deep” Neural Networks.
Before we talk about this though, I’d like to spend a bit more time discussing the “flexibility” of a model and revisit the Complexity Bias Tradeoff from the first Neural Network post.
What does it mean for a model to be “Flexible”?
Determining a model’s flexibility is more of an art than a science in my experience. Also it’s easier to think of a model’s flexibility relative to another model than to make absolute claims about its level of flexibility.
Let’s look at some of the previous iterations of our model to make these comparisons:
Espressos vs Productivity: 2 model parameters
Espressos, Exercise, Sleep, Breakfast vs Productivity: 7 model parameters
In this case, we’d say that the second model is more flexible because we have the ability to learn a more complex relationship for productivity, since it taking into account several more inputs. At the same time, though, we need more data to learn the values for the model parameters.
An imperfect heuristic for model flexibility is looking at the number of parameters that the model has. The more of these values that your model has that it can tune will provide the possibility to learn more complex relationships at the cost of requiring more time and data to learn those relationships.
Let’s say we want the lowest bias model possible and we are not short on data or computing power. The next section will dive into how Deep neural networks can turn the dial up on complexity to construct even better models.
Constructing Deep Networks
Up until this point, we just had our input neurons and our output neuron in our neural network. But we can dial up the flexibility of our model by adding so-called “hidden layers” to our network. We call these hidden layers because we don’t use their inputs or outputs directly. They’re used as an intermediary step on our way to constructing a prediction. These hidden layers are where deep neural networks and deep learning get the “deep” designation from. The more layers a neural network has, the deeper it is.
Enough talk. Let’s dive into a concrete example.
In the demo below, we have 3 predictors (espressos, hours of sleep and gym), 1 hidden layer with 2 neurons and finally 1 output neuron which outputs the prediction for productivity.
As I’ve mentioned previously, even though our neural network is more complicated now, we can still write out our model as an equation:
A couple of things to note about this equation:
The labels (yModel) at the top are used to call out the parts of the equation that correspond to the parts of the neural network.
The predictors are referred to by the first letter of their name (e.g. xE) refers to the number of espressos)
The weights and biases in this equation have been populated randomly so that it looks a bit more interesting
I think this is a good spot to stop and think again about flexibility, this equation now gives us the possibility to attach weights to particular combinations of the input variables. (strictly speaking, for this model, it actually doesn’t improve the expressive power of this model but we’ll see in future posts about activation functions that these hidden layers do add expressive power when applied correctly)
Calculating the prediction for a neural network is commonly referred to as forward propagation. This is because the nodes later in the network depend on the values of the previous layers of the network. So you start with the values of the predictors on the left hand side and they “propagate forward” as we multiply them by the weights of the edges that apply to them. Technically we were doing forward propagation in our previous models but now that we have a hidden layer the name makes a bit more sense.
Let’s see this in action:
One important takeaway that I’d like to come back to is that our model is still just an equation that we’re evaluating. The values propagating through the node is the same thing as evaluating the expression above.
So there we have it, that’s a “deep” neural network. To recap the main points:
Flexibility is a measure of a model’s ability to represent complex relationships between the inputs and outputs
Flexibility comes at a cost of more data and more time required to train models
Deep Neural Networks have hidden layers of neurons which don’t mean anything by themselves
Calculating a prediction on a neural network is referred to as forward propagation
In the next post my goal is to revisit gradient descent in the context of neural networks with hidden layers.