Generate human faces with machine learning in 100 lines of code.
If you have ever needed a picture of a fake person, you might know the following website:
This Person Does Not Exist
This Person Does Not Exist
This Person Does Not Existthispersondoesnotexist.com
Now, imagine that you were asked to design their algorithm — generate random pictures of fake persons. To draw a face, you will need to define what is a face in the first place as well as its characteristics and attributes. Then, for each attribute, you will have to define their possible colors, shapes, proportions, ... Finally and probably the more difficult challenge, you will need to assemble all of them in a coherent way. Are you ready for the challenge?
Well, putting aside the handcrafted approach, let us focus on generative models and more particularly Variational Autoencoders (VAEs). Generative models are often neural networks that allow to randomly generate new data points close to the ones they are trained on — these models are trained by unsupervised learning.
In this post, we will implement the famous Auto-Encoding Variational Bayes paper in about 100 lines of code.
Auto-Encoding Variational Bayes
How can we perform efficient inference and learning in directed probabilistic models, in the presence of continuous…
VAEs train a decoder that defines a stochastic mapping from some noise z to the target data x (here, pictures of human faces). These are parametrized by θ, trained by backpropagation. Therefore, no handcrafted engineering is required and the model learns useful patterns directly from the training data. In order to learn θ, an encoder parametrized by ϕ is introduced. The encoder defines a stochastic mapping from x to z. The parameters θ and ϕ are trained by optimizing a lower bound 𝓛 on the total log marginal likelihood of the training set:
As you can see, the encoder and decoder architectures are truly similar. Given some inputs, they output the hyper-parameters of multivariate normal Gaussian distributions — the encoder and decoder distributions.
Due to the similarity between the encoder and decoder, we will use the same piece of code to define them:
After careful review of the loss, you will notice that it requires sampling from the encoder. At the same time, when training is done, our main interest is to randomly sample new data points from the decoder. Given some noise, the forward function will use the reparametrization trick in order to sample these data points in a differentiable way. The loss also requires to evaluate the log likelihood of data points x given z. We will implement the function compute_log_density in that matter. Finally, a major term in the loss is the Kullback-Leibler divergence between the distribution represented by the encoder and the prior distribution over the latent variables z. This will be computed in the compute_KL function.
For a moment, imagine that all these pieces are implemented. We then have all the ingredients required to design the optimization algorithm:
During each epoch, we will sample a random minibatch from the training dataset. Then, we will sample some noise that will be fed together with the minibatch to the encoder in order to sample new latent variables z. These sampled data as well as the random minibatch allow to compute the training loss and then, to update θ and ϕ by backpropagation.
Now, let us train for one million epochs on the Frey Face dataset:
Once trained, the decoder allows sampling new images that look truly similar to the ones from the training set:
Of course, we still have not implemented the core functions in the encoder and decoder class definition. Please, do it as a homework. Solutions are available in the following GitHub repository.
I hope you enjoyed the story. If you did, please leave me a clap and follow me for similar content. On the same generative modelling path, you may be interested in this post where I talk about Normalizing Flows or in this post where I implement the Generative Adversarial Networks paper in 100 lines of code.