easycv.datasets.detection.pipelines package¶

class easycv.datasets.detection.pipelines.MMToTensor[source]¶

Bases: object

Transform image to Tensor.

Required key: ‘img’. Modifies key: ‘img’.

Parameters: results (dict) – contain all information about training.

class easycv.datasets.detection.pipelines.NormalizeTensor(mean, std)[source]¶

Bases: object

Normalize the Tensor image (CxHxW), with mean and std.

Required key: ‘img’. Modifies key: ‘img’.

Parameters

mean (list[float]) – Mean values of 3 channels.
std (list[float]) – Std values of 3 channels.

__init__(mean, std)[source]¶: Initialize self. See help(type(self)) for accurate signature.

class easycv.datasets.detection.pipelines.MMMosaic(img_scale=(640, 640), center_ratio_range=(0.5, 1.5), pad_val=114)[source]¶

Bases: object

Mosaic augmentation.

Given 4 images, mosaic transform combines them into one output image. The output image is composed of the parts from each sub- image.

                   mosaic transform
                      center_x
           +------------------------------+
           |       pad        |  pad      |
           |      +-----------+           |
           |      |           |           |
           |      |  image1   |--------+  |
           |      |           |        |  |
           |      |           | image2 |  |
center_y   |----+-------------+-----------|
           |    |   cropped   |           |
           |pad |   image3    |  image4   |
           |    |             |           |
           +----|-------------+-----------+
                |             |
                +-------------+

The mosaic transform steps are as follows:

    1. Choose the mosaic center as the intersections of 4 images
    2. Get the left top image according to the index, and randomly
       sample another 3 images from the custom dataset.
    3. Sub image will be cropped if image is larger than mosaic patch

Parameters

img_scale (Sequence[int]) – Image size after mosaic pipeline of single image. Default to (640, 640).
center_ratio_range (Sequence[float]) – Center ratio range of mosaic output. Default to (0.5, 1.5).
pad_val (int) – Pad value. Default to 114.

__init__(img_scale=(640, 640), center_ratio_range=(0.5, 1.5), pad_val=114)[source]¶: Initialize self. See help(type(self)) for accurate signature.

get_indexes(dataset)[source]¶

Call function to collect indexes.

Parameters: dataset (DetImagesMixDataset) – The dataset.
Returns: indexes.
Return type: list

class easycv.datasets.detection.pipelines.MMMixUp(img_scale=(640, 640), ratio_range=(0.5, 1.5), flip_ratio=0.5, pad_val=114, max_iters=15, min_bbox_size=5, min_area_ratio=0.2, max_aspect_ratio=20)[source]¶

Bases: object

MixUp data augmentation.

                    mixup transform
           +------------------------------+
           | mixup image   |              |
           |      +--------|--------+     |
           |      |        |        |     |
           |---------------+        |     |
           |      |                 |     |
           |      |      image      |     |
           |      |                 |     |
           |      |                 |     |
           |      |-----------------+     |
           |             pad              |
           +------------------------------+

The mixup transform steps are as follows::

   1. Another random image is picked by dataset and embedded in
      the top left patch(after padding and resizing)
   2. The target of mixup transform is the weighted average of mixup
      image and origin image.

Parameters

img_scale (Sequence[int]) – Image output size after mixup pipeline. Default: (640, 640).
ratio_range (Sequence[float]) – Scale ratio of mixup image. Default: (0.5, 1.5).
flip_ratio (float) – Horizontal flip ratio of mixup image. Default: 0.5.
pad_val (int) – Pad value. Default: 114.
max_iters (int) – The maximum number of iterations. If the number of iterations is greater than max_iters, but gt_bbox is still empty, then the iteration is terminated. Default: 15.
min_bbox_size (float) – Width and height threshold to filter bboxes. If the height or width of a box is smaller than this value, it will be removed. Default: 5.
min_area_ratio (float) – Threshold of area ratio between original bboxes and wrapped bboxes. If smaller than this value, the box will be removed. Default: 0.2.
max_aspect_ratio (float) – Aspect ratio of width and height threshold to filter bboxes. If max(h/w, w/h) larger than this value, the box will be removed. Default: 20.

__init__(img_scale=(640, 640), ratio_range=(0.5, 1.5), flip_ratio=0.5, pad_val=114, max_iters=15, min_bbox_size=5, min_area_ratio=0.2, max_aspect_ratio=20)[source]¶: Initialize self. See help(type(self)) for accurate signature.

get_indexes(dataset)[source]¶

Call function to collect indexes.

Parameters: dataset (DetImagesMixDataset) – The dataset.
Returns: indexes.
Return type: list

class easycv.datasets.detection.pipelines.MMRandomAffine(max_rotate_degree=10.0, max_translate_ratio=0.1, scaling_ratio_range=(0.5, 1.5), max_shear_degree=2.0, border=(0, 0), border_val=(114, 114, 114), min_bbox_size=2, min_area_ratio=0.2, max_aspect_ratio=20)[source]¶

Bases: object

Random affine transform data augmentation. for yolox

This operation randomly generates affine transform matrix which including rotation, translation, shear and scaling transforms.

Parameters

max_rotate_degree (float) – Maximum degrees of rotation transform. Default: 10.
max_translate_ratio (float) – Maximum ratio of translation. Default: 0.1.
scaling_ratio_range (tuple[float]) – Min and max ratio of scaling transform. Default: (0.5, 1.5).
max_shear_degree (float) – Maximum degrees of shear transform. Default: 2.
border (tuple[int]) – Distance from height and width sides of input image to adjust output shape. Only used in mosaic dataset. Default: (0, 0).
border_val (tuple[int]) – Border padding values of 3 channels. Default: (114, 114, 114).
min_bbox_size (float) – Width and height threshold to filter bboxes. If the height or width of a box is smaller than this value, it will be removed. Default: 2.
min_area_ratio (float) – Threshold of area ratio between original bboxes and wrapped bboxes. If smaller than this value, the box will be removed. Default: 0.2.
max_aspect_ratio (float) – Aspect ratio of width and height threshold to filter bboxes. If max(h/w, w/h) larger than this value, the box will be removed.

__init__(max_rotate_degree=10.0, max_translate_ratio=0.1, scaling_ratio_range=(0.5, 1.5), max_shear_degree=2.0, border=(0, 0), border_val=(114, 114, 114), min_bbox_size=2, min_area_ratio=0.2, max_aspect_ratio=20)[source]¶: Initialize self. See help(type(self)) for accurate signature.

filter_gt_bboxes(origin_bboxes, wrapped_bboxes)[source]¶

class easycv.datasets.detection.pipelines.MMPhotoMetricDistortion(brightness_delta=32, contrast_range=(0.5, 1.5), saturation_range=(0.5, 1.5), hue_delta=18)[source]¶

Bases: object

Apply photometric distortion to image sequentially, every transformation is applied with a probability of 0.5. The position of random contrast is in second or second to last.

random brightness
random contrast (mode 0)
convert color from BGR to HSV
random saturation
random hue
convert color from HSV to BGR
random contrast (mode 1)
randomly swap channels

Parameters

brightness_delta (int) – delta of brightness.
contrast_range (tuple) – range of contrast.
saturation_range (tuple) – range of saturation.
hue_delta (int) – delta of hue.

__init__(brightness_delta=32, contrast_range=(0.5, 1.5), saturation_range=(0.5, 1.5), hue_delta=18)[source]¶: Initialize self. See help(type(self)) for accurate signature.

class easycv.datasets.detection.pipelines.MMResize(img_scale=None, multiscale_mode='range', ratio_range=None, keep_ratio=True, bbox_clip_border=True, backend='cv2', override=False)[source]¶

Bases: object

Resize images & bbox & mask.

This transform resizes the input image to some scale. Bboxes and masks are then resized with the same scale factor. If the input dict contains the key “scale”, then the scale in the input dict is used, otherwise the specified scale in the init method is used. If the input dict contains the key “scale_factor” (if MultiScaleFlipAug does not give img_scale but scale_factor), the actual scale will be computed by image shape and scale_factor.

img_scale can either be a tuple (single-scale) or a list of tuple (multi-scale). There are 3 multiscale modes:

ratio_range is not None: randomly sample a ratio from the ratio range and multiply it with the image scale.
ratio_range is None and multiscale_mode == "range": randomly sample a scale from the multiscale range.
ratio_range is None and multiscale_mode == "value": randomly sample a scale from multiple scales.

Parameters

img_scale (tuple or list[tuple]) – Images scales for resizing.
multiscale_mode (str) – Either “range” or “value”.
ratio_range (tuple[float]) – (min_ratio, max_ratio)
keep_ratio (bool) – Whether to keep the aspect ratio when resizing the image.
bbox_clip_border (bool, optional) – Whether clip the objects outside the border of the image. Defaults to True.
backend (str) – Image resize backend, choices are ‘cv2’ and ‘pillow’. These two backends generates slightly different results. Defaults to ‘cv2’.
override (bool, optional) – Whether to override scale and scale_factor so as to call resize twice. Default False. If True, after the first resizing, the existed scale and scale_factor will be ignored so the second resizing can be allowed. This option is a work-around for multiple times of resize in DETR. Defaults to False.

__init__(img_scale=None, multiscale_mode='range', ratio_range=None, keep_ratio=True, bbox_clip_border=True, backend='cv2', override=False)[source]¶: Initialize self. See help(type(self)) for accurate signature.

static random_select(img_scales)[source]¶

Randomly select an img_scale from given candidates.

Parameters: img_scales (list[tuple]) – Images scales for selection.
Returns: Returns a tuple (img_scale, scale_dix), where img_scale is the selected image scale and scale_idx is the selected index in the given candidates.
Return type: (tuple, int)

static random_sample(img_scales)[source]¶

Randomly sample an img_scale when multiscale_mode=='range'.

Parameters: img_scales (list[tuple]) – Images scale range for sampling. There must be two tuples in img_scales, which specify the lower and upper bound of image scales.
Returns: Returns a tuple (img_scale, None), where img_scale is sampled scale and None is just a placeholder to be consistent with random_select().
Return type: (tuple, None)

static random_sample_ratio(img_scale, ratio_range)[source]¶

Randomly sample an img_scale when ratio_range is specified.

A ratio will be randomly sampled from the range specified by ratio_range. Then it would be multiplied with img_scale to generate sampled scale.

Parameters

img_scale (tuple) – Images scale base to multiply with ratio.
ratio_range (tuple[float]) – The minimum and maximum ratio to scale the img_scale.

Returns

Returns a tuple (scale, None), where scale is sampled ratio multiplied with img_scale and None is just a placeholder to be consistent with random_select().

Return type

(tuple, None)

class easycv.datasets.detection.pipelines.MMRandomFlip(flip_ratio=None, direction='horizontal')[source]¶

Bases: object

Flip the image & bbox & mask.

If the input dict contains the key “flip”, then the flag will be used, otherwise it will be randomly decided by a ratio specified in the init method.

When random flip is enabled, flip_ratio/direction can either be a float/string or tuple of float/string. There are 3 flip modes:

flip_ratio is float, direction is string: the image will be
direction``ly flipped with probability of ``flip_ratio . E.g., flip_ratio=0.5, direction='horizontal', then image will be horizontally flipped with probability of 0.5.
flip_ratio is float, direction is list of string: the image wil
be direction[i]``ly flipped with probability of ``flip_ratio/len(direction). E.g., flip_ratio=0.5, direction=['horizontal', 'vertical'], then image will be horizontally flipped with probability of 0.25, vertically with probability of 0.25.
flip_ratio is list of float, direction is list of string:
given len(flip_ratio) == len(direction), the image wil be direction[i]``ly flipped with probability of ``flip_ratio[i]. E.g., flip_ratio=[0.3, 0.5], direction=['horizontal', 'vertical'], then image will be horizontally flipped with probability of 0.3, vertically with probability of 0.5.

Parameters

flip_ratio (float | list[float], optional) – The flipping probability. Default: None.
direction (str | list[str], optional) – The flipping direction. Options are ‘horizontal’, ‘vertical’, ‘diagonal’. Default: ‘horizontal’. If input is a list, the length must equal flip_ratio. Each element in flip_ratio indicates the flip probability of corresponding direction.

__init__(flip_ratio=None, direction='horizontal')[source]¶: Initialize self. See help(type(self)) for accurate signature.

bbox_flip(bboxes, img_shape, direction)[source]¶

Flip bboxes horizontally.

Parameters

bboxes (numpy.ndarray) – Bounding boxes, shape (…, 4*k)
img_shape (tuple[int]) – Image shape (height, width)
direction (str) – Flip direction. Options are ‘horizontal’, ‘vertical’.

Returns

Flipped bounding boxes.

Return type

numpy.ndarray

class easycv.datasets.detection.pipelines.MMPad(size=None, size_divisor=None, pad_to_square=False, pad_val=0)[source]¶

Bases: object

Pad the image & mask.

There are two padding modes: (1) pad to a fixed size and (2) pad to the minimum size that is divisible by some number. Added keys are “pad_shape”, “pad_fixed_size”, “pad_size_divisor”,

Parameters

size (tuple, optional) – Fixed padding size.
size_divisor (int, optional) – The divisor of padded size.
pad_to_square (bool) – Whether to pad the image into a square. Currently only used for YOLOX. Default: False.
pad_val (float, optional) – Padding value, 0 by default.

__init__(size=None, size_divisor=None, pad_to_square=False, pad_val=0)[source]¶: Initialize self. See help(type(self)) for accurate signature.

class easycv.datasets.detection.pipelines.MMNormalize(mean, std, to_rgb=True)[source]¶

Bases: object

Normalize the image.

Added key is “img_norm_cfg”.

Parameters

mean (sequence) – Mean values of 3 channels.
std (sequence) – Std values of 3 channels.
to_rgb (bool) – Whether to convert the image from BGR to RGB, default is true.

__init__(mean, std, to_rgb=True)[source]¶: Initialize self. See help(type(self)) for accurate signature.

class easycv.datasets.detection.pipelines.LoadImageFromFile(to_float32=False, color_type='color', file_client_args={'backend': 'disk'})[source]¶

Bases: object

Load an image from file.

Required keys are “img_prefix” and “img_info” (a dict that must contain the key “filename”). Added or updated keys are “filename”, “img”, “img_shape”, “ori_shape” (same as img_shape), “pad_shape” (same as img_shape), “scale_factor” (1.0) and “img_norm_cfg” (means=0 and stds=1).

Parameters

to_float32 (bool) – Whether to convert the loaded image to a float32 numpy array. If set to False, the loaded image is an uint8 array. Defaults to False.
color_type (str) – The flag argument for mmcv.imfrombytes(). Defaults to ‘color’.
file_client_args (dict) – Arguments to instantiate a FileClient. See mmcv.fileio.FileClient for details. Defaults to dict(backend='disk').

__init__(to_float32=False, color_type='color', file_client_args={'backend': 'disk'})[source]¶: Initialize self. See help(type(self)) for accurate signature.

class easycv.datasets.detection.pipelines.LoadMultiChannelImageFromFiles(to_float32=False, color_type='unchanged', file_client_args={'backend': 'disk'})[source]¶

Bases: object

Load multi-channel images from a list of separate channel files.

Required keys are “img_prefix” and “img_info” (a dict that must contain the key “filename”, which is expected to be a list of filenames). Added or updated keys are “filename”, “img”, “img_shape”, “ori_shape” (same as img_shape), “pad_shape” (same as img_shape), “scale_factor” (1.0) and “img_norm_cfg” (means=0 and stds=1).

Parameters

to_float32 (bool) – Whether to convert the loaded image to a float32 numpy array. If set to False, the loaded image is an uint8 array. Defaults to False.
color_type (str) – The flag argument for mmcv.imfrombytes(). Defaults to ‘color’.
file_client_args (dict) – Arguments to instantiate a FileClient. See mmcv.fileio.FileClient for details. Defaults to dict(backend='disk').

__init__(to_float32=False, color_type='unchanged', file_client_args={'backend': 'disk'})[source]¶: Initialize self. See help(type(self)) for accurate signature.

class easycv.datasets.detection.pipelines.LoadAnnotations(with_bbox=True, with_label=True, with_mask=False, with_seg=False, poly2mask=True, file_client_args={'backend': 'disk'})[source]¶

Bases: object

Load multiple types of annotations.

Parameters

with_bbox (bool) – Whether to parse and load the bbox annotation. Default: True.
with_label (bool) – Whether to parse and load the label annotation. Default: True.
with_mask (bool) – Whether to parse and load the mask annotation. Default: False.
with_seg (bool) – Whether to parse and load the semantic segmentation annotation. Default: False.
poly2mask (bool) – Whether to convert the instance masks from polygons to bitmaps. Default: True.
file_client_args (dict) – Arguments to instantiate a FileClient. See mmcv.fileio.FileClient for details. Defaults to dict(backend='disk').

__init__(with_bbox=True, with_label=True, with_mask=False, with_seg=False, poly2mask=True, file_client_args={'backend': 'disk'})[source]¶: Initialize self. See help(type(self)) for accurate signature.

process_polygons(polygons)[source]¶

Convert polygons to list of ndarray and filter invalid polygons.

Parameters: polygons (list[list]) – Polygons of one instance.
Returns: Processed polygons.
Return type: list[numpy.ndarray]

class easycv.datasets.detection.pipelines.MMMultiScaleFlipAug(transforms, img_scale=None, scale_factor=None, flip=False, flip_direction='horizontal')[source]¶

Bases: object

Test-time augmentation with multiple scales and flipping.

An example configuration is as followed:

img_scale=[(1333, 400), (1333, 800)],
flip=True,
transforms=[
    dict(type='Resize', keep_ratio=True),
    dict(type='RandomFlip'),
    dict(type='Normalize', **img_norm_cfg),
    dict(type='Pad', size_divisor=32),
    dict(type='ImageToTensor', keys=['img']),
    dict(type='Collect', keys=['img']),
]

After MultiScaleFLipAug with above configuration, the results are wrapped into lists of the same length as followed:

dict(
    img=[...],
    img_shape=[...],
    scale=[(1333, 400), (1333, 400), (1333, 800), (1333, 800)]
    flip=[False, True, False, True]
    ...
)

Parameters

transforms (list[dict]) – Transforms to apply in each augmentation.
img_scale (tuple | list[tuple] | None) – Images scales for resizing.
scale_factor (float | list[float] | None) – Scale factors for resizing.
flip (bool) – Whether apply flip augmentation. Default: False.
flip_direction (str | list[str]) – Flip augmentation directions, options are “horizontal”, “vertical” and “diagonal”. If flip_direction is a list, multiple flip augmentations will be applied. It has no effect when flip == False. Default: “horizontal”.

__init__(transforms, img_scale=None, scale_factor=None, flip=False, flip_direction='horizontal')[source]¶: Initialize self. See help(type(self)) for accurate signature.

Submodules¶

easycv.datasets.detection.pipelines.mm_transforms module¶

class easycv.datasets.detection.pipelines.mm_transforms.MMToTensor[source]¶

Bases: object

Transform image to Tensor.

Required key: ‘img’. Modifies key: ‘img’.

Parameters: results (dict) – contain all information about training.

class easycv.datasets.detection.pipelines.mm_transforms.NormalizeTensor(mean, std)[source]¶

Bases: object

Normalize the Tensor image (CxHxW), with mean and std.

Required key: ‘img’. Modifies key: ‘img’.

Parameters

mean (list[float]) – Mean values of 3 channels.
std (list[float]) – Std values of 3 channels.

__init__(mean, std)[source]¶: Initialize self. See help(type(self)) for accurate signature.

class easycv.datasets.detection.pipelines.mm_transforms.MMMosaic(img_scale=(640, 640), center_ratio_range=(0.5, 1.5), pad_val=114)[source]¶

Bases: object

Mosaic augmentation.

Given 4 images, mosaic transform combines them into one output image. The output image is composed of the parts from each sub- image.

                   mosaic transform
                      center_x
           +------------------------------+
           |       pad        |  pad      |
           |      +-----------+           |
           |      |           |           |
           |      |  image1   |--------+  |
           |      |           |        |  |
           |      |           | image2 |  |
center_y   |----+-------------+-----------|
           |    |   cropped   |           |
           |pad |   image3    |  image4   |
           |    |             |           |
           +----|-------------+-----------+
                |             |
                +-------------+

The mosaic transform steps are as follows:

    1. Choose the mosaic center as the intersections of 4 images
    2. Get the left top image according to the index, and randomly
       sample another 3 images from the custom dataset.
    3. Sub image will be cropped if image is larger than mosaic patch

Parameters

img_scale (Sequence[int]) – Image size after mosaic pipeline of single image. Default to (640, 640).
center_ratio_range (Sequence[float]) – Center ratio range of mosaic output. Default to (0.5, 1.5).
pad_val (int) – Pad value. Default to 114.

__init__(img_scale=(640, 640), center_ratio_range=(0.5, 1.5), pad_val=114)[source]¶: Initialize self. See help(type(self)) for accurate signature.

get_indexes(dataset)[source]¶

Call function to collect indexes.

Parameters: dataset (DetImagesMixDataset) – The dataset.
Returns: indexes.
Return type: list

class easycv.datasets.detection.pipelines.mm_transforms.MMMixUp(img_scale=(640, 640), ratio_range=(0.5, 1.5), flip_ratio=0.5, pad_val=114, max_iters=15, min_bbox_size=5, min_area_ratio=0.2, max_aspect_ratio=20)[source]¶

Bases: object

MixUp data augmentation.

                    mixup transform
           +------------------------------+
           | mixup image   |              |
           |      +--------|--------+     |
           |      |        |        |     |
           |---------------+        |     |
           |      |                 |     |
           |      |      image      |     |
           |      |                 |     |
           |      |                 |     |
           |      |-----------------+     |
           |             pad              |
           +------------------------------+

The mixup transform steps are as follows::

   1. Another random image is picked by dataset and embedded in
      the top left patch(after padding and resizing)
   2. The target of mixup transform is the weighted average of mixup
      image and origin image.

Parameters

img_scale (Sequence[int]) – Image output size after mixup pipeline. Default: (640, 640).
ratio_range (Sequence[float]) – Scale ratio of mixup image. Default: (0.5, 1.5).
flip_ratio (float) – Horizontal flip ratio of mixup image. Default: 0.5.
pad_val (int) – Pad value. Default: 114.
max_iters (int) – The maximum number of iterations. If the number of iterations is greater than max_iters, but gt_bbox is still empty, then the iteration is terminated. Default: 15.
min_bbox_size (float) – Width and height threshold to filter bboxes. If the height or width of a box is smaller than this value, it will be removed. Default: 5.
min_area_ratio (float) – Threshold of area ratio between original bboxes and wrapped bboxes. If smaller than this value, the box will be removed. Default: 0.2.
max_aspect_ratio (float) – Aspect ratio of width and height threshold to filter bboxes. If max(h/w, w/h) larger than this value, the box will be removed. Default: 20.

__init__(img_scale=(640, 640), ratio_range=(0.5, 1.5), flip_ratio=0.5, pad_val=114, max_iters=15, min_bbox_size=5, min_area_ratio=0.2, max_aspect_ratio=20)[source]¶: Initialize self. See help(type(self)) for accurate signature.

get_indexes(dataset)[source]¶

Call function to collect indexes.

Parameters: dataset (DetImagesMixDataset) – The dataset.
Returns: indexes.
Return type: list

class easycv.datasets.detection.pipelines.mm_transforms.MMRandomAffine(max_rotate_degree=10.0, max_translate_ratio=0.1, scaling_ratio_range=(0.5, 1.5), max_shear_degree=2.0, border=(0, 0), border_val=(114, 114, 114), min_bbox_size=2, min_area_ratio=0.2, max_aspect_ratio=20)[source]¶

Bases: object

Random affine transform data augmentation. for yolox

This operation randomly generates affine transform matrix which including rotation, translation, shear and scaling transforms.

Parameters

max_rotate_degree (float) – Maximum degrees of rotation transform. Default: 10.
max_translate_ratio (float) – Maximum ratio of translation. Default: 0.1.
scaling_ratio_range (tuple[float]) – Min and max ratio of scaling transform. Default: (0.5, 1.5).
max_shear_degree (float) – Maximum degrees of shear transform. Default: 2.
border (tuple[int]) – Distance from height and width sides of input image to adjust output shape. Only used in mosaic dataset. Default: (0, 0).
border_val (tuple[int]) – Border padding values of 3 channels. Default: (114, 114, 114).
min_bbox_size (float) – Width and height threshold to filter bboxes. If the height or width of a box is smaller than this value, it will be removed. Default: 2.
min_area_ratio (float) – Threshold of area ratio between original bboxes and wrapped bboxes. If smaller than this value, the box will be removed. Default: 0.2.
max_aspect_ratio (float) – Aspect ratio of width and height threshold to filter bboxes. If max(h/w, w/h) larger than this value, the box will be removed.

__init__(max_rotate_degree=10.0, max_translate_ratio=0.1, scaling_ratio_range=(0.5, 1.5), max_shear_degree=2.0, border=(0, 0), border_val=(114, 114, 114), min_bbox_size=2, min_area_ratio=0.2, max_aspect_ratio=20)[source]¶: Initialize self. See help(type(self)) for accurate signature.

filter_gt_bboxes(origin_bboxes, wrapped_bboxes)[source]¶

class easycv.datasets.detection.pipelines.mm_transforms.MMPhotoMetricDistortion(brightness_delta=32, contrast_range=(0.5, 1.5), saturation_range=(0.5, 1.5), hue_delta=18)[source]¶

Bases: object

Apply photometric distortion to image sequentially, every transformation is applied with a probability of 0.5. The position of random contrast is in second or second to last.

random brightness
random contrast (mode 0)
convert color from BGR to HSV
random saturation
random hue
convert color from HSV to BGR
random contrast (mode 1)
randomly swap channels

Parameters

brightness_delta (int) – delta of brightness.
contrast_range (tuple) – range of contrast.
saturation_range (tuple) – range of saturation.
hue_delta (int) – delta of hue.

__init__(brightness_delta=32, contrast_range=(0.5, 1.5), saturation_range=(0.5, 1.5), hue_delta=18)[source]¶: Initialize self. See help(type(self)) for accurate signature.

class easycv.datasets.detection.pipelines.mm_transforms.MMResize(img_scale=None, multiscale_mode='range', ratio_range=None, keep_ratio=True, bbox_clip_border=True, backend='cv2', override=False)[source]¶

Bases: object

Resize images & bbox & mask.

This transform resizes the input image to some scale. Bboxes and masks are then resized with the same scale factor. If the input dict contains the key “scale”, then the scale in the input dict is used, otherwise the specified scale in the init method is used. If the input dict contains the key “scale_factor” (if MultiScaleFlipAug does not give img_scale but scale_factor), the actual scale will be computed by image shape and scale_factor.

img_scale can either be a tuple (single-scale) or a list of tuple (multi-scale). There are 3 multiscale modes:

ratio_range is not None: randomly sample a ratio from the ratio range and multiply it with the image scale.
ratio_range is None and multiscale_mode == "range": randomly sample a scale from the multiscale range.
ratio_range is None and multiscale_mode == "value": randomly sample a scale from multiple scales.

Parameters

img_scale (tuple or list[tuple]) – Images scales for resizing.
multiscale_mode (str) – Either “range” or “value”.
ratio_range (tuple[float]) – (min_ratio, max_ratio)
keep_ratio (bool) – Whether to keep the aspect ratio when resizing the image.
bbox_clip_border (bool, optional) – Whether clip the objects outside the border of the image. Defaults to True.
backend (str) – Image resize backend, choices are ‘cv2’ and ‘pillow’. These two backends generates slightly different results. Defaults to ‘cv2’.
override (bool, optional) – Whether to override scale and scale_factor so as to call resize twice. Default False. If True, after the first resizing, the existed scale and scale_factor will be ignored so the second resizing can be allowed. This option is a work-around for multiple times of resize in DETR. Defaults to False.

__init__(img_scale=None, multiscale_mode='range', ratio_range=None, keep_ratio=True, bbox_clip_border=True, backend='cv2', override=False)[source]¶: Initialize self. See help(type(self)) for accurate signature.

static random_select(img_scales)[source]¶

Randomly select an img_scale from given candidates.

Parameters: img_scales (list[tuple]) – Images scales for selection.
Returns: Returns a tuple (img_scale, scale_dix), where img_scale is the selected image scale and scale_idx is the selected index in the given candidates.
Return type: (tuple, int)

static random_sample(img_scales)[source]¶

Randomly sample an img_scale when multiscale_mode=='range'.

Parameters: img_scales (list[tuple]) – Images scale range for sampling. There must be two tuples in img_scales, which specify the lower and upper bound of image scales.
Returns: Returns a tuple (img_scale, None), where img_scale is sampled scale and None is just a placeholder to be consistent with random_select().
Return type: (tuple, None)

static random_sample_ratio(img_scale, ratio_range)[source]¶

Randomly sample an img_scale when ratio_range is specified.

A ratio will be randomly sampled from the range specified by ratio_range. Then it would be multiplied with img_scale to generate sampled scale.

Parameters

img_scale (tuple) – Images scale base to multiply with ratio.
ratio_range (tuple[float]) – The minimum and maximum ratio to scale the img_scale.

Returns

Returns a tuple (scale, None), where scale is sampled ratio multiplied with img_scale and None is just a placeholder to be consistent with random_select().

Return type

(tuple, None)

class easycv.datasets.detection.pipelines.mm_transforms.MMRandomFlip(flip_ratio=None, direction='horizontal')[source]¶

Bases: object

Flip the image & bbox & mask.

If the input dict contains the key “flip”, then the flag will be used, otherwise it will be randomly decided by a ratio specified in the init method.

When random flip is enabled, flip_ratio/direction can either be a float/string or tuple of float/string. There are 3 flip modes:

flip_ratio is float, direction is string: the image will be
direction``ly flipped with probability of ``flip_ratio . E.g., flip_ratio=0.5, direction='horizontal', then image will be horizontally flipped with probability of 0.5.
flip_ratio is float, direction is list of string: the image wil
be direction[i]``ly flipped with probability of ``flip_ratio/len(direction). E.g., flip_ratio=0.5, direction=['horizontal', 'vertical'], then image will be horizontally flipped with probability of 0.25, vertically with probability of 0.25.
flip_ratio is list of float, direction is list of string:
given len(flip_ratio) == len(direction), the image wil be direction[i]``ly flipped with probability of ``flip_ratio[i]. E.g., flip_ratio=[0.3, 0.5], direction=['horizontal', 'vertical'], then image will be horizontally flipped with probability of 0.3, vertically with probability of 0.5.

Parameters

flip_ratio (float | list[float], optional) – The flipping probability. Default: None.
direction (str | list[str], optional) – The flipping direction. Options are ‘horizontal’, ‘vertical’, ‘diagonal’. Default: ‘horizontal’. If input is a list, the length must equal flip_ratio. Each element in flip_ratio indicates the flip probability of corresponding direction.

__init__(flip_ratio=None, direction='horizontal')[source]¶: Initialize self. See help(type(self)) for accurate signature.

bbox_flip(bboxes, img_shape, direction)[source]¶

Flip bboxes horizontally.

Parameters

bboxes (numpy.ndarray) – Bounding boxes, shape (…, 4*k)
img_shape (tuple[int]) – Image shape (height, width)
direction (str) – Flip direction. Options are ‘horizontal’, ‘vertical’.

Returns

Flipped bounding boxes.

Return type

numpy.ndarray

class easycv.datasets.detection.pipelines.mm_transforms.MMPad(size=None, size_divisor=None, pad_to_square=False, pad_val=0)[source]¶

Bases: object

Pad the image & mask.

There are two padding modes: (1) pad to a fixed size and (2) pad to the minimum size that is divisible by some number. Added keys are “pad_shape”, “pad_fixed_size”, “pad_size_divisor”,

Parameters

size (tuple, optional) – Fixed padding size.
size_divisor (int, optional) – The divisor of padded size.
pad_to_square (bool) – Whether to pad the image into a square. Currently only used for YOLOX. Default: False.
pad_val (float, optional) – Padding value, 0 by default.

__init__(size=None, size_divisor=None, pad_to_square=False, pad_val=0)[source]¶: Initialize self. See help(type(self)) for accurate signature.

class easycv.datasets.detection.pipelines.mm_transforms.MMNormalize(mean, std, to_rgb=True)[source]¶

Bases: object

Normalize the image.

Added key is “img_norm_cfg”.

Parameters

mean (sequence) – Mean values of 3 channels.
std (sequence) – Std values of 3 channels.
to_rgb (bool) – Whether to convert the image from BGR to RGB, default is true.

__init__(mean, std, to_rgb=True)[source]¶: Initialize self. See help(type(self)) for accurate signature.

class easycv.datasets.detection.pipelines.mm_transforms.LoadImageFromFile(to_float32=False, color_type='color', file_client_args={'backend': 'disk'})[source]¶

Bases: object

Load an image from file.

Required keys are “img_prefix” and “img_info” (a dict that must contain the key “filename”). Added or updated keys are “filename”, “img”, “img_shape”, “ori_shape” (same as img_shape), “pad_shape” (same as img_shape), “scale_factor” (1.0) and “img_norm_cfg” (means=0 and stds=1).

Parameters

to_float32 (bool) – Whether to convert the loaded image to a float32 numpy array. If set to False, the loaded image is an uint8 array. Defaults to False.
color_type (str) – The flag argument for mmcv.imfrombytes(). Defaults to ‘color’.
file_client_args (dict) – Arguments to instantiate a FileClient. See mmcv.fileio.FileClient for details. Defaults to dict(backend='disk').

__init__(to_float32=False, color_type='color', file_client_args={'backend': 'disk'})[source]¶: Initialize self. See help(type(self)) for accurate signature.

class easycv.datasets.detection.pipelines.mm_transforms.LoadMultiChannelImageFromFiles(to_float32=False, color_type='unchanged', file_client_args={'backend': 'disk'})[source]¶

Bases: object

Load multi-channel images from a list of separate channel files.

Required keys are “img_prefix” and “img_info” (a dict that must contain the key “filename”, which is expected to be a list of filenames). Added or updated keys are “filename”, “img”, “img_shape”, “ori_shape” (same as img_shape), “pad_shape” (same as img_shape), “scale_factor” (1.0) and “img_norm_cfg” (means=0 and stds=1).

Parameters

to_float32 (bool) – Whether to convert the loaded image to a float32 numpy array. If set to False, the loaded image is an uint8 array. Defaults to False.
color_type (str) – The flag argument for mmcv.imfrombytes(). Defaults to ‘color’.
file_client_args (dict) – Arguments to instantiate a FileClient. See mmcv.fileio.FileClient for details. Defaults to dict(backend='disk').

__init__(to_float32=False, color_type='unchanged', file_client_args={'backend': 'disk'})[source]¶: Initialize self. See help(type(self)) for accurate signature.

class easycv.datasets.detection.pipelines.mm_transforms.LoadAnnotations(with_bbox=True, with_label=True, with_mask=False, with_seg=False, poly2mask=True, file_client_args={'backend': 'disk'})[source]¶

Bases: object

Load multiple types of annotations.

Parameters

with_bbox (bool) – Whether to parse and load the bbox annotation. Default: True.
with_label (bool) – Whether to parse and load the label annotation. Default: True.
with_mask (bool) – Whether to parse and load the mask annotation. Default: False.
with_seg (bool) – Whether to parse and load the semantic segmentation annotation. Default: False.
poly2mask (bool) – Whether to convert the instance masks from polygons to bitmaps. Default: True.
file_client_args (dict) – Arguments to instantiate a FileClient. See mmcv.fileio.FileClient for details. Defaults to dict(backend='disk').

__init__(with_bbox=True, with_label=True, with_mask=False, with_seg=False, poly2mask=True, file_client_args={'backend': 'disk'})[source]¶: Initialize self. See help(type(self)) for accurate signature.

process_polygons(polygons)[source]¶

Convert polygons to list of ndarray and filter invalid polygons.

Parameters: polygons (list[list]) – Polygons of one instance.
Returns: Processed polygons.
Return type: list[numpy.ndarray]

class easycv.datasets.detection.pipelines.mm_transforms.MMMultiScaleFlipAug(transforms, img_scale=None, scale_factor=None, flip=False, flip_direction='horizontal')[source]¶

Bases: object

Test-time augmentation with multiple scales and flipping.

An example configuration is as followed:

img_scale=[(1333, 400), (1333, 800)],
flip=True,
transforms=[
    dict(type='Resize', keep_ratio=True),
    dict(type='RandomFlip'),
    dict(type='Normalize', **img_norm_cfg),
    dict(type='Pad', size_divisor=32),
    dict(type='ImageToTensor', keys=['img']),
    dict(type='Collect', keys=['img']),
]

After MultiScaleFLipAug with above configuration, the results are wrapped into lists of the same length as followed:

dict(
    img=[...],
    img_shape=[...],
    scale=[(1333, 400), (1333, 400), (1333, 800), (1333, 800)]
    flip=[False, True, False, True]
    ...
)

Parameters

transforms (list[dict]) – Transforms to apply in each augmentation.
img_scale (tuple | list[tuple] | None) – Images scales for resizing.
scale_factor (float | list[float] | None) – Scale factors for resizing.
flip (bool) – Whether apply flip augmentation. Default: False.
flip_direction (str | list[str]) – Flip augmentation directions, options are “horizontal”, “vertical” and “diagonal”. If flip_direction is a list, multiple flip augmentations will be applied. It has no effect when flip == False. Default: “horizontal”.

__init__(transforms, img_scale=None, scale_factor=None, flip=False, flip_direction='horizontal')[source]¶: Initialize self. See help(type(self)) for accurate signature.