Models

Generator

class esrgan.model.generator.EncoderDecoderNet(encoder: torch.nn.modules.module.Module, decoder: torch.nn.modules.module.Module)[source]

Generalized Encoder-Decoder network.

Parameters
  • encoder – Encoder module, usually used for the extraction of embeddings from input signals.

  • decoder – Decoder module, usually used for embeddings processing e.g. generation of signal similar to the input one (in GANs).

forward(x: torch.Tensor) → torch.Tensor[source]

Forward pass method.

Parameters

x – Batch of input signals e.g. images.

Returns

Batch of generated signals e.g. images.

classmethod get_from_params(encoder_params: Optional[dict] = None, decoder_params: Optional[dict] = None) → esrgan.model.generator.EncoderDecoderNet[source]

Create model based on it config.

Parameters
  • encoder_params – Encoder module params.

  • decoder_params – Decoder module parameters.

Returns

Model.

SRGAN

class esrgan.model.module.srresnet.SRResNetEncoder(in_channels: int = 3, out_channels: int = 64, num_basic_blocks: int = 16, conv_fn: Union[Callable[[...], torch.nn.modules.module.Module], str, Dict[str, Any]] = functools.partial(<class 'torch.nn.modules.conv.Conv2d'>, kernel_size=(3, 3), padding=1), norm_fn: Union[Callable[[...], torch.nn.modules.module.Module], str, Dict[str, Any]] = <class 'torch.nn.modules.batchnorm.BatchNorm2d'>, activation_fn: Union[Callable[[...], torch.nn.modules.module.Module], str, Dict[str, Any]] = <class 'torch.nn.modules.activation.PReLU'>)[source]

‘Encoder’ part of SRResNet network, processing images in LR space.

It has been proposed in Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network.

Parameters
  • in_channels – Number of channels in the input image.

  • out_channels – Number of channels produced by the encoder.

  • num_basic_blocks – Depth of the encoder, number of basic blocks to use.

  • conv_fn – Convolutional layers parameters.

  • norm_fn – Batch norm layer to use.

  • activation_fn – Activation function to use after BN layers.

forward(x: torch.Tensor) → torch.Tensor[source]

Forward pass.

Parameters

x – Batch of images.

Returns

Batch of embeddings.

class esrgan.model.module.srresnet.SRResNetDecoder(in_channels: int = 64, out_channels: int = 3, scale_factor: int = 2, conv_fn: Union[Callable[[...], torch.nn.modules.module.Module], str, Dict[str, Any]] = functools.partial(<class 'torch.nn.modules.conv.Conv2d'>, kernel_size=(3, 3), padding=1), activation_fn: Union[Callable[[...], torch.nn.modules.module.Module], str, Dict[str, Any]] = <class 'torch.nn.modules.activation.PReLU'>)[source]

‘Decoder’ part of SRResNet, converting embeddings to output image.

It has been proposed in Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network.

Parameters
  • in_channels – Number of channels in the input embedding.

  • out_channels – Number of channels in the output image.

  • scale_factor – Ratio between the size of the high-resolution image (output) and its low-resolution counterpart (input). In other words multiplier for spatial size.

  • conv_fn – Convolutional layers parameters.

  • activation_fn – Activation function to use.

forward(x: torch.Tensor) → torch.Tensor[source]

Forward pass.

Parameters

x – Batch of embeddings.

Returns

Batch of upscaled images.

ESRGAN

class esrgan.model.module.esrnet.ESREncoder(in_channels: int = 3, out_channels: int = 64, growth_channels: int = 32, num_basic_blocks: int = 23, num_dense_blocks: int = 3, num_residual_blocks: int = 5, conv_fn: Union[Callable[[...], torch.nn.modules.module.Module], str, Dict[str, Any]] = functools.partial(<class 'torch.nn.modules.conv.Conv2d'>, kernel_size=(3, 3), padding=1), activation_fn: Union[Callable[[...], torch.nn.modules.module.Module], str, Dict[str, Any]] = functools.partial(<class 'torch.nn.modules.activation.LeakyReLU'>, negative_slope=0.2, inplace=True), residual_scaling: float = 0.2)[source]

‘Encoder’ part of ESRGAN network, processing images in LR space.

It has been proposed in ESRGAN: Enhanced Super-Resolution Generative Adversarial Networks.

Parameters
  • in_channels – Number of channels in the input image.

  • out_channels – Number of channels produced by the encoder.

  • growth_channels – Number of channels in the latent space.

  • num_basic_blocks – Depth of the encoder, number of Residual-in-Residual Dense block (RRDB) to use.

  • num_dense_blocks – Number of dense blocks to use to form RRDB block.

  • num_residual_blocks – Number of convolutions to use to form dense block.

  • conv_fn – Convolutional layers parameters.

  • activation_fn – Activation function to use after BN layers.

  • residual_scaling – Residual connections scaling factor.

forward(x: torch.Tensor) → torch.Tensor[source]

Forward pass.

Parameters

x – Batch of images.

Returns

Batch of embeddings.

class esrgan.model.module.esrnet.ESRNetDecoder(in_channels: int = 64, out_channels: int = 3, scale_factor: int = 2, conv_fn: Union[Callable[[...], torch.nn.modules.module.Module], str, Dict[str, Any]] = functools.partial(<class 'torch.nn.modules.conv.Conv2d'>, kernel_size=(3, 3), padding=1), activation_fn: Union[Callable[[...], torch.nn.modules.module.Module], str, Dict[str, Any]] = functools.partial(<class 'torch.nn.modules.activation.LeakyReLU'>, negative_slope=0.2, inplace=True))[source]

‘Decoder’ part of ESRGAN, converting embeddings to output image.

It has been proposed in ESRGAN: Enhanced Super-Resolution Generative Adversarial Networks.

Parameters
  • in_channels – Number of channels in the input embedding.

  • out_channels – Number of channels in the output image.

  • scale_factor – Ratio between the size of the high-resolution image (output) and its low-resolution counterpart (input). In other words multiplier for spatial size.

  • conv_fn – Convolutional layers parameters.

  • activation_fn – Activation function to use.

forward(x: torch.Tensor) → torch.Tensor[source]

Forward pass.

Parameters

x – Batch of embeddings.

Returns

Batch of upscaled images.

Discriminator

class esrgan.model.discriminator.VGGConv(encoder: torch.nn.modules.module.Module, pool: torch.nn.modules.module.Module, head: torch.nn.modules.module.Module)[source]

VGG-like neural network for image classification.

Parameters
  • encoder – Image encoder module, usually used for the extraction of embeddings from input signals.

  • pool – Pooling layer, used to reduce embeddings from the encoder.

  • head – Classification head, usually consists of Fully Connected layers.

forward(x: torch.Tensor) → torch.Tensor[source]

Forward call.

Parameters

x – Batch of images.

Returns

Batch of logits.

classmethod get_from_params(encoder_params: Optional[dict] = None, pooling_params: Optional[dict] = None, head_params: Optional[dict] = None) → esrgan.model.discriminator.VGGConv[source]

Create model based on it config.

Parameters
  • encoder_params – Params of encoder module.

  • pooling_params – Params of the pooling layer.

  • head_params – ‘Head’ module params.

Returns

Model.

class esrgan.model.module.conv.StridedConvEncoder(layers: Iterable[int] = (3, 64, 128, 128, 256, 256, 512, 512), layer_order: Iterable[str] = ('conv', 'norm', 'activation'), conv_fn: Union[Callable[[...], torch.nn.modules.module.Module], str, Dict[str, Any]] = functools.partial(<class 'torch.nn.modules.conv.Conv2d'>, kernel_size=(3, 3), padding=1), activation_fn: Union[Callable[[...], torch.nn.modules.module.Module], str, Dict[str, Any]] = functools.partial(<class 'torch.nn.modules.activation.LeakyReLU'>, negative_slope=0.2, inplace=True), norm_fn: Union[Callable[[...], torch.nn.modules.module.Module], str, Dict[str, Any], None] = <class 'torch.nn.modules.batchnorm.BatchNorm2d'>, residual_fn: Union[Callable[[...], torch.nn.modules.module.Module], str, Dict[str, Any], None] = None)[source]

Generalized Fully Convolutional encoder.

Parameters
  • layers – List of feature maps sizes of each block.

  • layer_order – Ordered list of layers applied within each block. For instance, if you don’t want to use normalization layer just exclude it from this list.

  • conv_fn – Convolutional layer params.

  • activation_fn – Activation function to use.

  • norm_fn – Normalization layer params, e.g. nn.BatchNorm2d.

  • residual_fn – Block wrapper function, e.g. ResidualModule can be used to add residual connections between blocks.

forward(x: torch.Tensor) → torch.Tensor[source]

Forward pass.

property in_channels

The number of channels in the feature map of the input.

property out_channels

Number of channels produced by the block.

class esrgan.model.module.linear.LinearHead(in_channels: int, out_channels: int, latent_channels: Optional[Iterable[int]] = None, layer_order: Iterable[str] = ('linear', 'activation'), linear_fn: Union[Callable[[...], torch.nn.modules.module.Module], str, Dict[str, Any]] = <class 'torch.nn.modules.linear.Linear'>, activation_fn: Union[Callable[[...], torch.nn.modules.module.Module], str, Dict[str, Any]] = functools.partial(<class 'torch.nn.modules.activation.LeakyReLU'>, negative_slope=0.2, inplace=True), norm_fn: Union[Callable[[...], torch.nn.modules.module.Module], str, Dict[str, Any], None] = None, dropout_fn: Union[Callable[[...], torch.nn.modules.module.Module], str, Dict[str, Any], None] = None)[source]

Stack of linear layers used for embeddings classification.

Parameters
  • in_channels – Size of each input sample.

  • out_channels – Size of each output sample.

  • latent_channels – Size of the latent space.

  • layer_order – Ordered list of layers applied within each block. For instance, if you don’t want to use normalization layer just exclude it from this list.

  • linear_fn – Linear layer params.

  • activation_fn – Activation function to use.

  • norm_fn – Normalization layer params, e.g. nn.BatchNorm1d.

  • dropout_fn – Dropout layer params, e.g. nn.Dropout.

forward(x: torch.Tensor) → torch.Tensor[source]

Forward pass.

Parameters

x – Batch of inputs e.g. images.

Returns

Batch of logits.

Layers

These are the basic building block for graphs

Containers

class esrgan.model.module.blocks.container.ConcatInputModule(module: Iterable[torch.nn.modules.module.Module])[source]

Module wrapper, passing outputs of all previous layers into each next layer.

Parameters

module – PyTorch layer to wrap.

forward(x: torch.Tensor) → torch.Tensor[source]

Forward pass.

class esrgan.model.module.blocks.container.ResidualModule(module: torch.nn.modules.module.Module, scale: float = 1.0, requires_grad: bool = False)[source]

Residual wrapper, adds identity connection.

It has been proposed in Deep Residual Learning for Image Recognition.

Parameters
  • module – PyTorch layer to wrap.

  • scale – Residual connections scaling factor.

  • requires_grad – If set to False, the layer will not learn the strength of the residual connection.

forward(x: torch.Tensor) → torch.Tensor[source]

Forward pass.

Residual-in-Residual Block

class esrgan.model.module.blocks.rrdb.ResidualDenseBlock(num_features: int, growth_channels: int, num_blocks: int = 5, conv_fn: Union[Callable[[...], torch.nn.modules.module.Module], str, Dict[str, Any]] = functools.partial(<class 'torch.nn.modules.conv.Conv2d'>, kernel_size=(3, 3), padding=1), activation_fn: Union[Callable[[...], torch.nn.modules.module.Module], str, Dict[str, Any]] = functools.partial(<class 'torch.nn.modules.activation.LeakyReLU'>, negative_slope=0.2, inplace=True), residual_scaling: float = 0.2)[source]

Basic block of ResidualInResidualDenseBlock.

Parameters
  • num_features\(C\) from an expected input of size \((N, C, H, W)\).

  • growth_channels – Number of channels in the latent space.

  • num_blocks – Number of convolutional blocks to use to form dense block.

  • conv_fn – Convolutional layers parameters.

  • activation_fn – Activation function to use after each conv layer.

  • residual_scaling – Residual connections scaling factor.

class esrgan.model.module.blocks.rrdb.ResidualInResidualDenseBlock(num_features: int = 64, growth_channels: int = 32, num_dense_blocks: int = 3, residual_scaling: float = 0.2, **kwargs: Any)[source]

Residual-in-Residual Dense Block (RRDB).

Look at the paper: ESRGAN: Enhanced Super-Resolution Generative Adversarial Networks for more details.

Parameters
  • num_features\(C\) from an expected input of size \((N, C, H, W)\).

  • growth_channels – Number of channels in the latent space.

  • num_dense_blocks – Number of dense blocks to use to form RRDB block.

  • residual_scaling – Residual connections scaling factor.

  • **kwargs – Dense block params.

Upsample

class esrgan.model.module.blocks.upsampling.SubPixelConv(num_features: int, scale_factor: int = 2, conv_fn: Union[Callable[[...], torch.nn.modules.module.Module], str, Dict[str, Any]] = functools.partial(<class 'torch.nn.modules.conv.Conv2d'>, kernel_size=(3, 3), padding=1), activation_fn: Union[Callable[[...], torch.nn.modules.module.Module], str, Dict[str, Any]] = <class 'torch.nn.modules.activation.PReLU'>)[source]

Rearranges elements in a tensor of shape \((B, C \times r^2, H, W)\) to a tensor of shape \((B, C, H \times r, W \times r)\).

Look at the paper: Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network for more details.

Parameters
  • num_features – Number of channels in the input tensor.

  • scale_factor – Factor to increase spatial resolution by.

  • conv_fn – Convolution layer params.

  • activation_fn – Activation function to use after sub-pixel convolution.

forward(x: torch.Tensor) → torch.Tensor[source]

Forward pass. Apply conv -> shuffle pixels -> apply nonlinearity.

Parameters

x – Batch of inputs.

Returns

Upscaled input.

class esrgan.model.module.blocks.upsampling.InterpolateConv(num_features: int, scale_factor: int = 2, conv_fn: Union[Callable[[...], torch.nn.modules.module.Module], str, Dict[str, Any]] = functools.partial(<class 'torch.nn.modules.conv.Conv2d'>, kernel_size=(3, 3), padding=1), activation_fn: Union[Callable[[...], torch.nn.modules.module.Module], str, Dict[str, Any]] = functools.partial(<class 'torch.nn.modules.activation.LeakyReLU'>, negative_slope=0.2, inplace=True))[source]

Upsamples a given multi-channel 2D (spatial) data.

Parameters
  • num_features – Number of channels in the input tensor.

  • scale_factor – Factor to increase spatial resolution by.

  • conv_fn – Convolutional layer params.

  • activation_fn – Activation function to use after convolution.

forward(x: torch.Tensor) → torch.Tensor[source]

Forward pass. Upscale input -> apply conv -> apply nonlinearity.

Parameters

x – Batch of inputs.

Returns

Upscaled data.

Misc

class esrgan.model.module.blocks.misc.Conv2dSN(in_channels: int, out_channels: int, kernel_size: Union[int, Tuple[int, int]] = (3, 3), stride: Union[int, Tuple[int, int]] = 1, padding: Union[int, Tuple[int, int]] = 0, dilation: Union[int, Tuple[int, int]] = 1, groups: int = 1, bias: bool = True, padding_mode: str = 'zeros', n_power_iterations: int = 1)[source]

nn.Conv2d + spectral normalization.

Applies a 2D convolution over an input signal composed of several input planes. In the simplest case, the output value of the layer with input size \((N, C_{\text{in}}, H, W)\) and output \((N, C_{\text{out}}, H_{\text{out}}, W_{\text{out}})\) can be precisely described as:

\[\text{out}(N_i, C_{\text{out}_j}) = \text{bias}(C_{\text{out}_j}) + \sum_{k = 0}^{C_{\text{in}} - 1} \text{weight}(C_{\text{out}_j}, k) \star \text{input}(N_i, k)\]

where \(\star\) is the valid 2D cross-correlation operator, \(N\) is a batch size, \(C\) denotes a number of channels, \(H\) is a height of input planes in pixels, and \(W\) is width in pixels.

Spectral normalization stabilizes the training of discriminators (critics) in Generative Adversarial Networks (GANs) by rescaling the weight tensor with spectral norm \(\sigma\) of the weight matrix calculated using power iteration method. See Spectral Normalization for Generative Adversarial Networks for details.

Parameters
  • in_channels – Number of channels in the input image.

  • out_channels – Number of channels produced by the convolution.

  • kernel_size – Size of the convolving kernel.

  • stride – Stride of the convolution.

  • padding – Padding added to both sides of the input.

  • dilation – Spacing between kernel elements.

  • groups – Number of blocked connections from input channels to output channels.

  • bias – If True, adds a learnable bias to the output.

  • padding_mode'zeros', 'reflect', 'replicate' or 'circular'.

  • n_power_iterations – Number of power iterations to calculate spectral norm.

class esrgan.model.module.blocks.misc.LinearSN(in_features: int, out_features: int, bias: bool = True, n_power_iterations: int = 1)[source]

nn.Linear + spectral normalization.

Applies a linear transformation to the incoming data: \(y = xA^T + b\).

Spectral normalization stabilizes the training of discriminators (critics) in Generative Adversarial Networks (GANs) by rescaling the weight tensor with spectral norm \(\sigma\) of the weight matrix calculated using power iteration method. See Spectral Normalization for Generative Adversarial Networks for details.

Parameters
  • in_features – Size of each input sample.

  • out_features – Size of each output sample.

  • bias – If set to False, the layer will not learn an additive bias.

  • n_power_iterations – Number of power iterations to calculate spectral norm.