Network architecture


Main modifications in the structure of generator G, discriminator D, and training process in comparison to SRGAN:

  1. all BN layers were removed from the generator;

  2. original basic blocks were replaced with the proposed Residual-in-Residual Dense Blocks (RRDB), which combines multi-level residual network and dense connections;

  3. relativistic discriminator, which tries to predict the probability that a real image \(x_r\) is relatively more realistic than a fake one \(x_f\);

  4. perceptual loss on features before activation.

Some results

LR (low resolution)


ESRGAN (ours)

HR (high resolution)

../_images/0802lr.png ../_images/0802sr.png ../_images/0802.png ../_images/0802hr.png
../_images/0805lr.png ../_images/0805sr.png ../_images/0805.png ../_images/0805hr.png
../_images/0811lr.png ../_images/0811sr.png ../_images/0811.png ../_images/0811hr.png
../_images/0815lr.png ../_images/0815sr.png ../_images/0815.png ../_images/0815hr.png
../_images/0829lr.png ../_images/0829sr.png ../_images/0829.png ../_images/0829hr.png
../_images/0845lr.png ../_images/0845sr.png ../_images/0845.png ../_images/0845hr.png
../_images/0853lr.png ../_images/0853sr.png ../_images/0853.png ../_images/0853hr.png
../_images/0857lr.png ../_images/0857sr.png ../_images/0857.png ../_images/0857hr.png
../_images/0886lr.png ../_images/0886sr.png ../_images/0886.png ../_images/0886hr.png
../_images/0887lr.png ../_images/0887sr.png ../_images/0887.png ../_images/0887hr.png

Qualitative results

PSNR (evaluated on the Y channel) and the perceptual index used in the PIRM-SR challenge are also provided for reference. [1]

../_images/qualitative_cmp_01.jpg ../_images/qualitative_cmp_02.jpg ../_images/qualitative_cmp_03.jpg ../_images/qualitative_cmp_04.jpg

Ablation study

Overall visual comparisons for showing the effects of each component in ESRGAN. Each column represents a model with its configurations in the top. The red sign indicates the main improvement compared with the previous model. [1]


BatchNorm artifacts

We empirically observe that BN layers tend to bring artifacts. These artifacts, namely BN artifacts, occasionally appear among iterations and different settings, violating the needs for a stable performance over training. We find that the network depth, BN position, training dataset and training loss have impact on the occurrence of BN artifacts. [1]


Useful techniques to train a very deep network

We find that residual scaling and smaller initialization can help to train a very deep network.

  • A smaller initialization than MSRA initialization (multiplying 0.1 for all initialization parameters that calculated by MSRA initialization) works well in our experiments;

  • In our settings, for each residual block, the residual features after the last convolution layer are multiplied by 0.2. [1]

init a init b

The influence of training patch size

We observe that training a deeper network benefits from a larger patch size. Moreover, the deeper model achieves more improvement (∼0.12dB) than the shallower one (∼0.04dB) since larger model capacity is capable of taking full advantage of larger training patch size. (Evaluated on Set5 dataset with RGB channels.) [1]

16 blocks 23 blocks