Understanding Autoencoders

4 min readJan 12, 2022

In deep learning, many models are based on dimensionality reduction. It’s the fact of “summarizing” a vector of a certain size, into a smaller vector. In this article I will present auto-encoders, which are a type of neural network with a particular architecture, that makes it useful for many tasks.

What is the architecture of an autoencoder?

In practice, an auto-encoder consists of two parts. The first part is the encoder. The encoder’s goal is to condense the initially available data (image, text, audio, etc.) by extracting a features vector that characterizes the initial information. The vector resulting from the encoder is much smaller than the initial vector.

The decoder is the second part of the autoencoder. Its goal is to reconstruct the model’s input using the condensed vector.

For example, when working on images, a well trained autoencoder can take an image as input, reduce it into a small vector via the encoder then recreate it only thanks to this small vector via the decoder.

architecture of an autoencoder — Architecture of an autoencoder

What’s the latent space?

Now that we are familiar with encoder and decoder, let’s present the third main piece of an autoencoder: the latent space.

The latent space corresponds to a new representation of our input data. In this new representation we keep only the most important information contained in the initial data, while filtering out the noise, it’s the principle of feature extraction.

When the latent space is well constructed, it keeps the similarities between the data and allows to have a continuous structure. For example, with the MNIST handwritten digits image database, here is the visualization of a latent space of size 2 :

You can see this image as a 2D-landmark. You can see that the digit classes are well separated, and that similar digits are close to each other.

How do we train an autoencoder?

To train an auto-encoder, we feed it with input data that will be encoded, and then decoded. During the training, we expect the model’s output to be as close as possible to the input.

For the training, the loss can be the Eucledian distance between the input data and the output data. This loss is called reconstruction loss.

If we manage to reconstruct the input data using only the encoding, it means that it contains enough information, and that both encoder and decoder are doing the job well.

What are the applications of an autoencoder?

As I mentioned in the introduction, autoencoders have a lot of applications. Some of them are presented in this section.

Using autoencoders to generate images

Before GANs became the standards for image and art generation, autoencoders were used. To do this, we only have to use the decoder to decode vectors from the latent space that are not present in our dataset initially.

We can do basic operations on images, encode the result, decode it, and we have a new item.

Autoencoders for image denoising

Autoencoders can also be used for image denoising.

To train an autoencoder of this kind one possible option is to simulate data noise on the training data, provide the noisy data as input to the autoencoder and have the denoised images as output.

In practice, the best solutions for generating or denoising images that rely on autoencoders work with variational auto-encoders (VAE). VAEs learn the probability distribution of the data and thus ensure the continuity of the resulting encodings. That will be the subject of a future article