data module

This module extends keras.utils.Sequence. dataset generators for the three main cases in image denoising:

  1. Clean Dataset: Only clean (ground-truth) images are available. Hence, you will need to specify a function to corrupt the clean images, so that you can train your network with pairs (noisy, clean). On the section Artificial Noises, we cover built-in functions for adding noise to clean images.
  2. Full Dataset: Both clean and noisy images are available. In that case, the dataset yields the image paris (clean, noisy).
  3. Blind Dataset: Only noisy images are available. These datasets can be used for qualitative avaliation.

We remark that these classes can be used for training and inference on our Benchmark. It is also noteworthy that MatlabModel objects, which are based on Matlab Deep Learning Toolbox, need to use the MatlabDatasetWrapper to be efficiently trained. For more information, look at data.MatlabDatasetWrapper and model.MatlabModel classes documentation.

Along with these classes, we also provide built-in functionalities for preprocessing, such as Data Augmentation and patch extraction. These are covered in the Preprocessing Functions section.

Data generation

class data.AbstractDatasetGenerator(path, batch_size=32, shuffle=True, name='AbstractDataset', n_channels=1)[source]

Bases: keras.utils.data_utils.Sequence

Dataset generator based on Keras library. implementation based on https://stanford.edu/~shervine/blog/keras-how-to-generate-data-on-the-fly

path

String containing the path to image files directory.

Type:str
batch_size

Size of image batch.

Type:int
n_channels

1 for grayscale, 3 for RGB.

Type:int
shuffle

Whether to shuffle the dataset at each epoch or not.

Type:bool
channels_first

Whether data is formatted as (BatchSize, Height, Width, Channels) or (BatchSize, Channels, Height, Width).

Type:bool
name

String containing the dataset’s name.

Type:str
__init__(self, path, batch_size=32, shuffle=True, name='AbstractDataset', n_channels=1)[source]

Initialize self. See help(type(self)) for accurate signature.

__len__(self)[source]

Number of batches per epoch

__str__(self)[source]

Returns the dataset name.

on_epoch_end(self)[source]

Defines and shuffles indexes on epoch end

class data.BlindDatasetGenerator(path, batch_size=32, shuffle=True, name='CleanDataset', n_channels=1, preprocessing=None, target_fcn=None)[source]

Bases: OpenDenoising.data.abstract_dataset.AbstractDatasetGenerator

Dataset generator based on Keras library. This class is used for Blind denoising problems, where only noisy images are available.

path

String containing the path to image files directory.

Type:str
batch_size

Size of image batch.

Type:int
n_channels

1 for grayscale, 3 for RGB.

Type:int
shuffle

Whether to shuffle the dataset at each epoch or not.

Type:bool
name

String containing the dataset’s name.

Type:str
preprocessing

List of preprocessing functions, which will be applied to each image.

Type:list
target_fcn

Function implementing how to generate target images from noisy ones.

Type:function
__getitem__(self, i)[source]

Generates image batches from filenames.

Parameters:i (int) – Batch index to get.
Returns:
  • inp (numpy.ndararray) – Batch of noisy images.
  • ref (numpy.ndarray) – Batch of target images.
__init__(self, path, batch_size=32, shuffle=True, name='CleanDataset', n_channels=1, preprocessing=None, target_fcn=None)[source]

Initialize self. See help(type(self)) for accurate signature.

__next__(self)[source]

Returns image batches sequentially.

__repr__(self)[source]

Return repr(self).

class data.CleanDatasetGenerator(path, batch_size=32, noise_config=None, shuffle=True, name='CleanDataset', n_channels=1, preprocessing=None)[source]

Bases: OpenDenoising.data.abstract_dataset.AbstractDatasetGenerator

Dataset generator based on Keras library. This class is used for non-blind denoising problems where only clean images are available. To use such dataset to train denoising networks, you need to specify a type of artificial noise that will be added to each clean image.

path

String containing the path to image files directory.

Type:str
batch_size

Size of image batch.

Type:int
n_channels

1 for grayscale, 3 for RGB.

Type:int
shuffle

Whether to shuffle the dataset at each epoch or not.

Type:bool
channels_first

Whether data is formatted as (BatchSize, Height, Width, Channels) or (BatchSize, Channels, Height, Width).

Type:bool
name

String containing the dataset’s name.

Type:str
preprocessing

List of preprocessing functions, which will be applied to each image.

Type:list
noise_config

Dictionary whose keys are functions implementing the noise process, and the value is a list containing the noise function parameters. If you do not want to specify any parameters, your list should be empty.

Type:dict

Examples

The following example corresponds to a Dataset Generator which reads images from “./images”, yields batches of length 32, applies Gaussian noise with intensity drawn uniformely from the range [0, 55] followed by “Super Resolution noise” of intensity 4. Moreover, the dataset shuffles the data, yields them in NHWC format, and does not apply any preprocessing function. NOTE: your list should be in the same order as your arguments.

>>> from OpenDenoising import data
>>> noise_config = {data.utils.gaussian_blind_noise: [0, 55],
...                 data.utils.super_resolution_noise: [4]}
>>> datagen = data.CleanDatasetGenerator("./images", 32, noise_config, True, False, "MyData", 1, None)
__getitem__(self, i)[source]

Generates batches of data.

__init__(self, path, batch_size=32, noise_config=None, shuffle=True, name='CleanDataset', n_channels=1, preprocessing=None)[source]

Initialize self. See help(type(self)) for accurate signature.

__repr__(self)[source]

Return repr(self).

__str__(self)[source]

Returns the dataset name.

class data.FullDatasetGenerator(path, batch_size=32, shuffle=True, name='FullDataset', n_channels=1, preprocessing=None)[source]

Bases: OpenDenoising.data.abstract_dataset.AbstractDatasetGenerator

Dataset generator based on Keras library. This class is used for non-blind denoising problems. Unlike ClenDatasetGenerator class, this class corresponds to the case where both clean and noisy samples are available and paired (for each noisy image, there is one and only one clean image with same filename).

path

String containing the path to image files directory.

Type:str
batch_size

Size of image batch.

Type:int
n_channels

1 for grayscale, 3 for RGB.

Type:int
shuffle

Whether to shuffle the dataset at each epoch or not.

Type:bool
name

String containing the dataset’s name.

Type:str
preprocessing

List of preprocessing functions, which will be applied to each image.

Type:list
__getitem__(self, i)[source]

Generate batches of data

__init__(self, path, batch_size=32, shuffle=True, name='FullDataset', n_channels=1, preprocessing=None)[source]

Initialize self. See help(type(self)) for accurate signature.

__next__(self)[source]

Returns image batches sequentially.

__repr__(self)[source]

Return repr(self).

class data.MatlabDatasetWrapper(engine, images_path='./tmp/BSDS500/Train/ref', partition='Train', ext=None, patch_size=40, n_patches=16, noiseFcn="@(I) imnoise(I, 'gaussian', 0, 25/255)", channel_format='grayscale', type='Clean')[source]

Bases: object

This class wraps FullMatlabDataset and CleanMatlabDataset classes. It makes internal calls to Matlab through Python Matlab’s engine to load one these classes into the workspace.

Notes

This functions does not implement the interface provided by keras.utils.Sequence. It should only be used alongside with MatlabModel objects.

engine

Matlab engine instance.

Type:matlab.engine.MatlabEngine
images_path

String to directory containing dataset’s images.

Type:str
partition

String containing the dataset partition (‘Train’, ‘Valid’ or ‘Test’).

Type:str
ext

Dictionary holding images extensions. Note that this dictionary should have keys only.

Type:dict
patch_size

Size of patches to be extracted.

Type:int
n_patches

Number of patches to be extracted on each image.

Type:int
noiseFcn

For CleanDataset only. Specifies the noising function that will be applied to images. It should be a function that accepts as input an image, and returns another image. If you need to specify parameters, you can do so by using Matlab’s anonymous function syntax (by specifying noiseFcn=”@(I) imnoise(I, ‘gaussian’, 0, 25/255)”), for instance.

Type:str
channel_format

String containing ‘grayscale’ for grayscale images, or ‘RGB’ for RGB images.

Type:str
type

String containing Clean (for CleanDataset) or Full (for FullDataset).

Type:str

See also

model.MatlabModel
for the type of model for which this class was designed to interact.
__call__(self)[source]

Call self as a function.

__init__(self, engine, images_path='./tmp/BSDS500/Train/ref', partition='Train', ext=None, patch_size=40, n_patches=16, noiseFcn="@(I) imnoise(I, 'gaussian', 0, 25/255)", channel_format='grayscale', type='Clean')[source]

Initialize self. See help(type(self)) for accurate signature.

Artificial Noises

data.gaussian_noise(ref, noise_level=15)[source]

Gaussian noise

Parameters:
  • ref (numpy.ndarray) – Image to be noised.
  • noise_level (numpy.ndarray) – Level of corruption. Always give the noise_level in terms of 0-255 pixel intensity range.
Returns:

inp – Noised image.

Return type:

numpy.ndarray

data.poisson_noise(ref)[source]

Poisson noise

Parameters:ref (numpy.ndarray) – Image to be noised.
Returns:inp – Noised image.
Return type:numpy.ndarray
data.salt_and_pepper_noise(ref, noise_level=15)[source]

Salt and pepper noise

Parameters:
Returns:

inp – Noised image.

Return type:

numpy.ndarray

data.speckle_noise(ref, noise_level=15)[source]

Speckle noise

Parameters:
Returns:

inp – Noised image.

Return type:

numpy.ndarray

data.super_resolution_noise(ref, noise_level=2)[source]

Noise due to down-sampling followed by up-sampling an image

Parameters:
  • ref (numpy.ndarray) – Image to be noised.
  • noise_level (numpy.ndarray) – scaling factor. For instance, for an image (512, 512), a factor 2 down-samples the image to (256, 256), then up-samples it again to (512, 512).
Returns:

inp – Noised image.

Return type:

numpy.ndarray

Preprocessing Functions

data.dncnn_augmentation(inp, ref=None, aug_times=1, channels_first=False)[source]

Data augmentation policy employed on DnCNN

Parameters:
  • inp (numpy.ndarray) – Noised image.
  • ref (numpy.ndarray) – Ground-truth images.
  • aug_times (int) – Number of times augmentation if applied.
Returns:

data.gen_patches(inp, ref, patch_size, channels_first=False, mode='sequential', n_patches=-1)[source]

Patch generation function.

Parameters:
  • inp (numpy.ndarray) – Noised image which patches will be extracted.
  • ref (numpy.ndarray) – Reference image which patches will be extracted.
  • patch_size (int) – Size of patch window (number of pixels in each axis).
  • channels_first (bool) – Whether data is formatted as NCHW (True) or NHWC (False).
  • mode (str) – One between {‘sequential’, ‘random’}. If mode = ‘sequential’, extracts patches sequentially on each axis. If mode = ‘random’, extracts patches randomly.
  • n_patches (int) –

    Number of patches to be extracted from the image. Should be specified only if mode = ‘random’. If not specified, then,

    \[n\_patches = \dfrac{h \times w}{patch_{size}^{2}}\]
Returns: