A Practical Guide to Neural Neighbor Style Transfer


Neural Neighbor Style Transfer (NNST) is the improved version of the original Neural Style Transfer (NST). Neural Style Transfer is a type of algorithm that stylizes the digital image or video by adopting the visual style of another image. It is an algorithm based on deep learning that creates digital illustrations from photographs, for example by stylizing the images given to the user using famous paintings. Neural Neighbor Style Transfer is the latest model used to create styled images, it can be implemented by PyTorch deep learning framework. In this article, we will understand Neural Neighbor Style Transfer, from its introduction to how it works. Below are the main points that we are going to discuss in this article.


  1. Introduction to Neural Neighbor Style Transfer
  2. Variants of the Neural Neighbor style transfer
  3. Overview of how NNST works
  4. Model evaluation
  5. Implementation of NNST

Let’s start by understanding the introduction to Neural Neighbor style transfer.

Introduction to Neural Neighbor Style Transfer (NNST)

In 2015, Leon A. Gatys proposed the original neural style transfer, using input photos, we can create artworks with similar content and distinct visual characteristics, of course, it uses the networks of neurons to create artworks as neural network methods have the advantage of not requiring any additional information to perform the stylization, users only need to specify the input image or image content and style image (any artistic painting) to create a stylized image.

This model works on two main ideas. The first shows how to use features extracted by convolutional neural networks that are pre-trained for image classification tasks to measure high-level compatibility between two images. The second idea is to use gradient descent to directly optimize the output pixels to minimize both “style loss” (statistical matching based on VGG artwork features) and “content loss” ( reduction of deviations from the characteristics of the content image).

Typically, it’s not possible to match style feature stats and keep content features as-is. To address this drawback, Neural Neighbor Style Transfer takes the idea of ​​guided patch-based synthesis (this is an algorithm used to find similarities between small patches of images) to explicitly match the content and style features. But the model does not work directly on the image patches; instead, it uses features taken from the pre-trained VGG network.

Neural Neighbor style transfer is more generalized and efficient than Neural style transfer.

The NNST approach is to replace the neural features extracted from the content image with the style image and in the end synthesize the final output.

Variants of the Neural Neighbor style transfer

There are two variations of the NNST model, the first is NNST-D and the second is NNST-Opt. The NNST-D decodes the styled output directly from the rearranged style features using a convolutional neural network; it can work better and faster than previous methods, this template only takes a few seconds to stylize 512 X 512 pixel output. NNST-Opt is the optimization-based type which provides better quality but at slower speed, it takes 30 seconds for 512 X 512.

Overview of how NNST works

The main steps are given in the figure below.

Source of images

This is a simplified version of the original NNST; several details are removed for better understanding.

First, we extract the characteristics of the content image and the style image. Next, zero-center the content image function and the style image function. Center zero means that the data should be processed so that the average (average) is zero, this is a common image pre-processing that helps the neural network perform better. After that, replace the content image features with the closest style feature. In the fourth block, NNST-D and NNST-Opt are implemented, we will discuss the two one by one.

Neural Neighbor Style Transfer Decoder (NNST-D): the ‘G’ decoder takes the input, then this input image is inserted into 4 independent branches, Each branch produces a level of a 4-level Laplacian pyramid parameterizing the output image. (A Laplacian pyramid is an invertible linear representation of an image, by dividing the image into different isotropic spatial frequency bands, the Laplacian pyramid provides another level of analysis).

Each branch has the same architecture but with different parameters, it consists of five 3*3 Conv layers with a leaky ReLU activation function, but the last layer has a linear activation function. All hidden states have 256 channels. In order to trade off-channel depth for resolution, transposed convolutions are applied to the first two branches.

No changes are made to the third branch. In the fourth branch, the output is bilinearly downsampled by a factor of two. The final output image is generated by combining the four branch outputs. Each branch exit is treated as a level in a Laplacian pyramid. In training, the model uses MS-COCO data for content images and Wikiart for style images.

Neural Neighbor Style Forwarding Optimization (NNST-Opt): NNST-Opt is the same as NNST-D but an optimized version. When performing style transfers using G is fast, optimizing the output image often leads directly to cleaner, artifact-free results. So it minimizes the equation, using Adam, 200 updates of x (output image) have been minimized. x (output image) parameterized as an 8-level Laplacian pyramid to allow large regions to quickly change color in 200 updates.

The last part is optional and styles the content color using moment matching.

Model evaluation

Using a dataset of 30 high-resolution photographs from Flickr, we compare the performance of our method and previous work. A collection of ten ink drawings and ten watercolors are collected and ten works of impressionist art created between 1800 and 1900 come from the open source collection of the Rijksmuseum. These images were added to the ten pencil drawings taken from the dataset used in Im2Pencil. Finally, it becomes the dataset of a total of 40 high-resolution style images.

After performing both variants on the same content/style 1200 possible combinations. We found that despite the good results of the encoder variant, they are not as good as the optimization-based model. It will therefore be crucial to reduce this gap to develop a practical high-quality stylization algorithm.

Implementation of NNST

We can implement the NNST using the PyTorch deep learning framework. Start by cloning the git repository, you will get all the folders and the main python file styleTransfer.py.

!git clone https://github.com/nkolkin13/NeuralNeighborStyleTransfer.git

Then run this command for style transfer.

python styleTransfer.py –content_path PATH_TO_CONTENT_IMAGE –style_path PATH_TO_STYLE_IMAGE –output_path PATH_TO_OUTPUT

PATH_TO_CONTENT_IMAGE: path to content image

PATH_TO_STYLE_IMAGE: path to style image

PATH_TO_OUTPUT: path where you want to save the output style image.

Also, this instruction will download a nearly 500MB pre-trained VGG model. The model took 190 seconds to finish working in the Colab. After that we can get the output image on a given path. On the left is the content image and in the middle is the style image and the right image is the output of the neural style transformation. The model did a good job.

Choose another style image for neural style transfer.

We can observe that the model does not preserve the image details of the content. So to control this, there is an alpha value by which you can retain content details. The alpha value is between 0 and 1, for maximum content preservation use alpha == 0. The default alpha value is 0.75.

So I changed the alpha value to 0.5 and ran the model again.

python styleTransfer.py –content_path PATH_TO_CONTENT_IMAGE –style_path PATH_TO_STYLE_IMAGE –output_path PATH_TO_OUTPUTalpha 0.5

See the results of how it manages to preserve content details this time.

With an alpha value of 0.8, also did a very good job.

You can play around with the Alpha value and choose different styles of images for different results.

Last words

In this article, with the help of the original Neural Style Transfer, we understand what Neural Neighbor Style Transfer is, we also get an overview of how it works and explain its variants. In the end, we implemented the NNST-Opt to create style images, we tried different style images and alpha values ​​for the style image.



Comments are closed.