easycv.datasets.detection.pipelines package¶
- class easycv.datasets.detection.pipelines.MMToTensor[source]¶
Bases:
objectTransform image to Tensor.
Required key: ‘img’. Modifies key: ‘img’.
- Parameters
results (dict) – contain all information about training.
- class easycv.datasets.detection.pipelines.NormalizeTensor(mean, std)[source]¶
Bases:
objectNormalize the Tensor image (CxHxW), with mean and std.
Required key: ‘img’. Modifies key: ‘img’.
- Parameters
mean (list[float]) – Mean values of 3 channels.
std (list[float]) – Std values of 3 channels.
- class easycv.datasets.detection.pipelines.MMMosaic(img_scale=(640, 640), center_ratio_range=(0.5, 1.5), pad_val=114)[source]¶
Bases:
objectMosaic augmentation.
Given 4 images, mosaic transform combines them into one output image. The output image is composed of the parts from each sub- image.
mosaic transform center_x +------------------------------+ | pad | pad | | +-----------+ | | | | | | | image1 |--------+ | | | | | | | | | image2 | | center_y |----+-------------+-----------| | | cropped | | |pad | image3 | image4 | | | | | +----|-------------+-----------+ | | +-------------+ The mosaic transform steps are as follows: 1. Choose the mosaic center as the intersections of 4 images 2. Get the left top image according to the index, and randomly sample another 3 images from the custom dataset. 3. Sub image will be cropped if image is larger than mosaic patch- Parameters
img_scale (Sequence[int]) – Image size after mosaic pipeline of single image. Default to (640, 640).
center_ratio_range (Sequence[float]) – Center ratio range of mosaic output. Default to (0.5, 1.5).
pad_val (int) – Pad value. Default to 114.
- class easycv.datasets.detection.pipelines.MMMixUp(img_scale=(640, 640), ratio_range=(0.5, 1.5), flip_ratio=0.5, pad_val=114, max_iters=15, min_bbox_size=5, min_area_ratio=0.2, max_aspect_ratio=20)[source]¶
Bases:
objectMixUp data augmentation.
mixup transform +------------------------------+ | mixup image | | | +--------|--------+ | | | | | | |---------------+ | | | | | | | | image | | | | | | | | | | | |-----------------+ | | pad | +------------------------------+ The mixup transform steps are as follows:: 1. Another random image is picked by dataset and embedded in the top left patch(after padding and resizing) 2. The target of mixup transform is the weighted average of mixup image and origin image.- Parameters
img_scale (Sequence[int]) – Image output size after mixup pipeline. Default: (640, 640).
ratio_range (Sequence[float]) – Scale ratio of mixup image. Default: (0.5, 1.5).
flip_ratio (float) – Horizontal flip ratio of mixup image. Default: 0.5.
pad_val (int) – Pad value. Default: 114.
max_iters (int) – The maximum number of iterations. If the number of iterations is greater than max_iters, but gt_bbox is still empty, then the iteration is terminated. Default: 15.
min_bbox_size (float) – Width and height threshold to filter bboxes. If the height or width of a box is smaller than this value, it will be removed. Default: 5.
min_area_ratio (float) – Threshold of area ratio between original bboxes and wrapped bboxes. If smaller than this value, the box will be removed. Default: 0.2.
max_aspect_ratio (float) – Aspect ratio of width and height threshold to filter bboxes. If max(h/w, w/h) larger than this value, the box will be removed. Default: 20.
- class easycv.datasets.detection.pipelines.MMRandomAffine(max_rotate_degree=10.0, max_translate_ratio=0.1, scaling_ratio_range=(0.5, 1.5), max_shear_degree=2.0, border=(0, 0), border_val=(114, 114, 114), min_bbox_size=2, min_area_ratio=0.2, max_aspect_ratio=20)[source]¶
Bases:
objectRandom affine transform data augmentation. for yolox
This operation randomly generates affine transform matrix which including rotation, translation, shear and scaling transforms.
- Parameters
max_rotate_degree (float) – Maximum degrees of rotation transform. Default: 10.
max_translate_ratio (float) – Maximum ratio of translation. Default: 0.1.
scaling_ratio_range (tuple[float]) – Min and max ratio of scaling transform. Default: (0.5, 1.5).
max_shear_degree (float) – Maximum degrees of shear transform. Default: 2.
border (tuple[int]) – Distance from height and width sides of input image to adjust output shape. Only used in mosaic dataset. Default: (0, 0).
border_val (tuple[int]) – Border padding values of 3 channels. Default: (114, 114, 114).
min_bbox_size (float) – Width and height threshold to filter bboxes. If the height or width of a box is smaller than this value, it will be removed. Default: 2.
min_area_ratio (float) – Threshold of area ratio between original bboxes and wrapped bboxes. If smaller than this value, the box will be removed. Default: 0.2.
max_aspect_ratio (float) – Aspect ratio of width and height threshold to filter bboxes. If max(h/w, w/h) larger than this value, the box will be removed.
- class easycv.datasets.detection.pipelines.MMPhotoMetricDistortion(brightness_delta=32, contrast_range=(0.5, 1.5), saturation_range=(0.5, 1.5), hue_delta=18)[source]¶
Bases:
objectApply photometric distortion to image sequentially, every transformation is applied with a probability of 0.5. The position of random contrast is in second or second to last.
random brightness
random contrast (mode 0)
convert color from BGR to HSV
random saturation
random hue
convert color from HSV to BGR
random contrast (mode 1)
randomly swap channels
- Parameters
brightness_delta (int) – delta of brightness.
contrast_range (tuple) – range of contrast.
saturation_range (tuple) – range of saturation.
hue_delta (int) – delta of hue.
- class easycv.datasets.detection.pipelines.MMResize(img_scale=None, multiscale_mode='range', ratio_range=None, keep_ratio=True, bbox_clip_border=True, backend='cv2', override=False)[source]¶
Bases:
objectResize images & bbox & mask.
This transform resizes the input image to some scale. Bboxes and masks are then resized with the same scale factor. If the input dict contains the key “scale”, then the scale in the input dict is used, otherwise the specified scale in the init method is used. If the input dict contains the key “scale_factor” (if MultiScaleFlipAug does not give img_scale but scale_factor), the actual scale will be computed by image shape and scale_factor.
img_scale can either be a tuple (single-scale) or a list of tuple (multi-scale). There are 3 multiscale modes:
ratio_range is not None: randomly sample a ratio from the ratio range and multiply it with the image scale.ratio_range is Noneandmultiscale_mode == "range": randomly sample a scale from the multiscale range.ratio_range is Noneandmultiscale_mode == "value": randomly sample a scale from multiple scales.
- Parameters
img_scale (tuple or list[tuple]) – Images scales for resizing.
multiscale_mode (str) – Either “range” or “value”.
ratio_range (tuple[float]) – (min_ratio, max_ratio)
keep_ratio (bool) – Whether to keep the aspect ratio when resizing the image.
bbox_clip_border (bool, optional) – Whether clip the objects outside the border of the image. Defaults to True.
backend (str) – Image resize backend, choices are ‘cv2’ and ‘pillow’. These two backends generates slightly different results. Defaults to ‘cv2’.
override (bool, optional) – Whether to override scale and scale_factor so as to call resize twice. Default False. If True, after the first resizing, the existed scale and scale_factor will be ignored so the second resizing can be allowed. This option is a work-around for multiple times of resize in DETR. Defaults to False.
- __init__(img_scale=None, multiscale_mode='range', ratio_range=None, keep_ratio=True, bbox_clip_border=True, backend='cv2', override=False)[source]¶
Initialize self. See help(type(self)) for accurate signature.
- static random_select(img_scales)[source]¶
Randomly select an img_scale from given candidates.
- Parameters
img_scales (list[tuple]) – Images scales for selection.
- Returns
Returns a tuple
(img_scale, scale_dix), whereimg_scaleis the selected image scale andscale_idxis the selected index in the given candidates.- Return type
(tuple, int)
- static random_sample(img_scales)[source]¶
Randomly sample an img_scale when
multiscale_mode=='range'.- Parameters
img_scales (list[tuple]) – Images scale range for sampling. There must be two tuples in img_scales, which specify the lower and upper bound of image scales.
- Returns
Returns a tuple
(img_scale, None), whereimg_scaleis sampled scale and None is just a placeholder to be consistent withrandom_select().- Return type
(tuple, None)
- static random_sample_ratio(img_scale, ratio_range)[source]¶
Randomly sample an img_scale when
ratio_rangeis specified.A ratio will be randomly sampled from the range specified by
ratio_range. Then it would be multiplied withimg_scaleto generate sampled scale.- Parameters
img_scale (tuple) – Images scale base to multiply with ratio.
ratio_range (tuple[float]) – The minimum and maximum ratio to scale the
img_scale.
- Returns
Returns a tuple
(scale, None), wherescaleis sampled ratio multiplied withimg_scaleand None is just a placeholder to be consistent withrandom_select().- Return type
(tuple, None)
- class easycv.datasets.detection.pipelines.MMRandomFlip(flip_ratio=None, direction='horizontal')[source]¶
Bases:
objectFlip the image & bbox & mask.
If the input dict contains the key “flip”, then the flag will be used, otherwise it will be randomly decided by a ratio specified in the init method.
When random flip is enabled,
flip_ratio/directioncan either be a float/string or tuple of float/string. There are 3 flip modes:flip_ratiois float,directionis string: the image will bedirection``ly flipped with probability of ``flip_ratio. E.g.,flip_ratio=0.5,direction='horizontal', then image will be horizontally flipped with probability of 0.5.
flip_ratiois float,directionis list of string: the image wilbe
direction[i]``ly flipped with probability of ``flip_ratio/len(direction). E.g.,flip_ratio=0.5,direction=['horizontal', 'vertical'], then image will be horizontally flipped with probability of 0.25, vertically with probability of 0.25.
flip_ratiois list of float,directionis list of string:given
len(flip_ratio) == len(direction), the image wil bedirection[i]``ly flipped with probability of ``flip_ratio[i]. E.g.,flip_ratio=[0.3, 0.5],direction=['horizontal', 'vertical'], then image will be horizontally flipped with probability of 0.3, vertically with probability of 0.5.
- Parameters
flip_ratio (float | list[float], optional) – The flipping probability. Default: None.
direction (str | list[str], optional) – The flipping direction. Options are ‘horizontal’, ‘vertical’, ‘diagonal’. Default: ‘horizontal’. If input is a list, the length must equal
flip_ratio. Each element inflip_ratioindicates the flip probability of corresponding direction.
- __init__(flip_ratio=None, direction='horizontal')[source]¶
Initialize self. See help(type(self)) for accurate signature.
- bbox_flip(bboxes, img_shape, direction)[source]¶
Flip bboxes horizontally.
- Parameters
bboxes (numpy.ndarray) – Bounding boxes, shape (…, 4*k)
img_shape (tuple[int]) – Image shape (height, width)
direction (str) – Flip direction. Options are ‘horizontal’, ‘vertical’.
- Returns
Flipped bounding boxes.
- Return type
numpy.ndarray
- class easycv.datasets.detection.pipelines.MMPad(size=None, size_divisor=None, pad_to_square=False, pad_val=0)[source]¶
Bases:
objectPad the image & mask.
There are two padding modes: (1) pad to a fixed size and (2) pad to the minimum size that is divisible by some number. Added keys are “pad_shape”, “pad_fixed_size”, “pad_size_divisor”,
- Parameters
size (tuple, optional) – Fixed padding size.
size_divisor (int, optional) – The divisor of padded size.
pad_to_square (bool) – Whether to pad the image into a square. Currently only used for YOLOX. Default: False.
pad_val (float, optional) – Padding value, 0 by default.
- class easycv.datasets.detection.pipelines.MMNormalize(mean, std, to_rgb=True)[source]¶
Bases:
objectNormalize the image.
Added key is “img_norm_cfg”.
- Parameters
mean (sequence) – Mean values of 3 channels.
std (sequence) – Std values of 3 channels.
to_rgb (bool) – Whether to convert the image from BGR to RGB, default is true.
- class easycv.datasets.detection.pipelines.LoadImageFromFile(to_float32=False, color_type='color', file_client_args={'backend': 'disk'})[source]¶
Bases:
objectLoad an image from file.
Required keys are “img_prefix” and “img_info” (a dict that must contain the key “filename”). Added or updated keys are “filename”, “img”, “img_shape”, “ori_shape” (same as img_shape), “pad_shape” (same as img_shape), “scale_factor” (1.0) and “img_norm_cfg” (means=0 and stds=1).
- Parameters
to_float32 (bool) – Whether to convert the loaded image to a float32 numpy array. If set to False, the loaded image is an uint8 array. Defaults to False.
color_type (str) – The flag argument for
mmcv.imfrombytes(). Defaults to ‘color’.file_client_args (dict) – Arguments to instantiate a FileClient. See
mmcv.fileio.FileClientfor details. Defaults todict(backend='disk').
- class easycv.datasets.detection.pipelines.LoadMultiChannelImageFromFiles(to_float32=False, color_type='unchanged', file_client_args={'backend': 'disk'})[source]¶
Bases:
objectLoad multi-channel images from a list of separate channel files.
Required keys are “img_prefix” and “img_info” (a dict that must contain the key “filename”, which is expected to be a list of filenames). Added or updated keys are “filename”, “img”, “img_shape”, “ori_shape” (same as img_shape), “pad_shape” (same as img_shape), “scale_factor” (1.0) and “img_norm_cfg” (means=0 and stds=1).
- Parameters
to_float32 (bool) – Whether to convert the loaded image to a float32 numpy array. If set to False, the loaded image is an uint8 array. Defaults to False.
color_type (str) – The flag argument for
mmcv.imfrombytes(). Defaults to ‘color’.file_client_args (dict) – Arguments to instantiate a FileClient. See
mmcv.fileio.FileClientfor details. Defaults todict(backend='disk').
- class easycv.datasets.detection.pipelines.LoadAnnotations(with_bbox=True, with_label=True, with_mask=False, with_seg=False, poly2mask=True, file_client_args={'backend': 'disk'})[source]¶
Bases:
objectLoad multiple types of annotations.
- Parameters
with_bbox (bool) – Whether to parse and load the bbox annotation. Default: True.
with_label (bool) – Whether to parse and load the label annotation. Default: True.
with_mask (bool) – Whether to parse and load the mask annotation. Default: False.
with_seg (bool) – Whether to parse and load the semantic segmentation annotation. Default: False.
poly2mask (bool) – Whether to convert the instance masks from polygons to bitmaps. Default: True.
file_client_args (dict) – Arguments to instantiate a FileClient. See
mmcv.fileio.FileClientfor details. Defaults todict(backend='disk').
- class easycv.datasets.detection.pipelines.MMMultiScaleFlipAug(transforms, img_scale=None, scale_factor=None, flip=False, flip_direction='horizontal')[source]¶
Bases:
objectTest-time augmentation with multiple scales and flipping.
An example configuration is as followed:
img_scale=[(1333, 400), (1333, 800)], flip=True, transforms=[ dict(type='Resize', keep_ratio=True), dict(type='RandomFlip'), dict(type='Normalize', **img_norm_cfg), dict(type='Pad', size_divisor=32), dict(type='ImageToTensor', keys=['img']), dict(type='Collect', keys=['img']), ]
After MultiScaleFLipAug with above configuration, the results are wrapped into lists of the same length as followed:
dict( img=[...], img_shape=[...], scale=[(1333, 400), (1333, 400), (1333, 800), (1333, 800)] flip=[False, True, False, True] ... )
- Parameters
transforms (list[dict]) – Transforms to apply in each augmentation.
img_scale (tuple | list[tuple] | None) – Images scales for resizing.
scale_factor (float | list[float] | None) – Scale factors for resizing.
flip (bool) – Whether apply flip augmentation. Default: False.
flip_direction (str | list[str]) – Flip augmentation directions, options are “horizontal”, “vertical” and “diagonal”. If flip_direction is a list, multiple flip augmentations will be applied. It has no effect when flip == False. Default: “horizontal”.
Submodules¶
easycv.datasets.detection.pipelines.mm_transforms module¶
- class easycv.datasets.detection.pipelines.mm_transforms.MMToTensor[source]¶
Bases:
objectTransform image to Tensor.
Required key: ‘img’. Modifies key: ‘img’.
- Parameters
results (dict) – contain all information about training.
- class easycv.datasets.detection.pipelines.mm_transforms.NormalizeTensor(mean, std)[source]¶
Bases:
objectNormalize the Tensor image (CxHxW), with mean and std.
Required key: ‘img’. Modifies key: ‘img’.
- Parameters
mean (list[float]) – Mean values of 3 channels.
std (list[float]) – Std values of 3 channels.
- class easycv.datasets.detection.pipelines.mm_transforms.MMMosaic(img_scale=(640, 640), center_ratio_range=(0.5, 1.5), pad_val=114)[source]¶
Bases:
objectMosaic augmentation.
Given 4 images, mosaic transform combines them into one output image. The output image is composed of the parts from each sub- image.
mosaic transform center_x +------------------------------+ | pad | pad | | +-----------+ | | | | | | | image1 |--------+ | | | | | | | | | image2 | | center_y |----+-------------+-----------| | | cropped | | |pad | image3 | image4 | | | | | +----|-------------+-----------+ | | +-------------+ The mosaic transform steps are as follows: 1. Choose the mosaic center as the intersections of 4 images 2. Get the left top image according to the index, and randomly sample another 3 images from the custom dataset. 3. Sub image will be cropped if image is larger than mosaic patch- Parameters
img_scale (Sequence[int]) – Image size after mosaic pipeline of single image. Default to (640, 640).
center_ratio_range (Sequence[float]) – Center ratio range of mosaic output. Default to (0.5, 1.5).
pad_val (int) – Pad value. Default to 114.
- class easycv.datasets.detection.pipelines.mm_transforms.MMMixUp(img_scale=(640, 640), ratio_range=(0.5, 1.5), flip_ratio=0.5, pad_val=114, max_iters=15, min_bbox_size=5, min_area_ratio=0.2, max_aspect_ratio=20)[source]¶
Bases:
objectMixUp data augmentation.
mixup transform +------------------------------+ | mixup image | | | +--------|--------+ | | | | | | |---------------+ | | | | | | | | image | | | | | | | | | | | |-----------------+ | | pad | +------------------------------+ The mixup transform steps are as follows:: 1. Another random image is picked by dataset and embedded in the top left patch(after padding and resizing) 2. The target of mixup transform is the weighted average of mixup image and origin image.- Parameters
img_scale (Sequence[int]) – Image output size after mixup pipeline. Default: (640, 640).
ratio_range (Sequence[float]) – Scale ratio of mixup image. Default: (0.5, 1.5).
flip_ratio (float) – Horizontal flip ratio of mixup image. Default: 0.5.
pad_val (int) – Pad value. Default: 114.
max_iters (int) – The maximum number of iterations. If the number of iterations is greater than max_iters, but gt_bbox is still empty, then the iteration is terminated. Default: 15.
min_bbox_size (float) – Width and height threshold to filter bboxes. If the height or width of a box is smaller than this value, it will be removed. Default: 5.
min_area_ratio (float) – Threshold of area ratio between original bboxes and wrapped bboxes. If smaller than this value, the box will be removed. Default: 0.2.
max_aspect_ratio (float) – Aspect ratio of width and height threshold to filter bboxes. If max(h/w, w/h) larger than this value, the box will be removed. Default: 20.
- class easycv.datasets.detection.pipelines.mm_transforms.MMRandomAffine(max_rotate_degree=10.0, max_translate_ratio=0.1, scaling_ratio_range=(0.5, 1.5), max_shear_degree=2.0, border=(0, 0), border_val=(114, 114, 114), min_bbox_size=2, min_area_ratio=0.2, max_aspect_ratio=20)[source]¶
Bases:
objectRandom affine transform data augmentation. for yolox
This operation randomly generates affine transform matrix which including rotation, translation, shear and scaling transforms.
- Parameters
max_rotate_degree (float) – Maximum degrees of rotation transform. Default: 10.
max_translate_ratio (float) – Maximum ratio of translation. Default: 0.1.
scaling_ratio_range (tuple[float]) – Min and max ratio of scaling transform. Default: (0.5, 1.5).
max_shear_degree (float) – Maximum degrees of shear transform. Default: 2.
border (tuple[int]) – Distance from height and width sides of input image to adjust output shape. Only used in mosaic dataset. Default: (0, 0).
border_val (tuple[int]) – Border padding values of 3 channels. Default: (114, 114, 114).
min_bbox_size (float) – Width and height threshold to filter bboxes. If the height or width of a box is smaller than this value, it will be removed. Default: 2.
min_area_ratio (float) – Threshold of area ratio between original bboxes and wrapped bboxes. If smaller than this value, the box will be removed. Default: 0.2.
max_aspect_ratio (float) – Aspect ratio of width and height threshold to filter bboxes. If max(h/w, w/h) larger than this value, the box will be removed.
- class easycv.datasets.detection.pipelines.mm_transforms.MMPhotoMetricDistortion(brightness_delta=32, contrast_range=(0.5, 1.5), saturation_range=(0.5, 1.5), hue_delta=18)[source]¶
Bases:
objectApply photometric distortion to image sequentially, every transformation is applied with a probability of 0.5. The position of random contrast is in second or second to last.
random brightness
random contrast (mode 0)
convert color from BGR to HSV
random saturation
random hue
convert color from HSV to BGR
random contrast (mode 1)
randomly swap channels
- Parameters
brightness_delta (int) – delta of brightness.
contrast_range (tuple) – range of contrast.
saturation_range (tuple) – range of saturation.
hue_delta (int) – delta of hue.
- class easycv.datasets.detection.pipelines.mm_transforms.MMResize(img_scale=None, multiscale_mode='range', ratio_range=None, keep_ratio=True, bbox_clip_border=True, backend='cv2', override=False)[source]¶
Bases:
objectResize images & bbox & mask.
This transform resizes the input image to some scale. Bboxes and masks are then resized with the same scale factor. If the input dict contains the key “scale”, then the scale in the input dict is used, otherwise the specified scale in the init method is used. If the input dict contains the key “scale_factor” (if MultiScaleFlipAug does not give img_scale but scale_factor), the actual scale will be computed by image shape and scale_factor.
img_scale can either be a tuple (single-scale) or a list of tuple (multi-scale). There are 3 multiscale modes:
ratio_range is not None: randomly sample a ratio from the ratio range and multiply it with the image scale.ratio_range is Noneandmultiscale_mode == "range": randomly sample a scale from the multiscale range.ratio_range is Noneandmultiscale_mode == "value": randomly sample a scale from multiple scales.
- Parameters
img_scale (tuple or list[tuple]) – Images scales for resizing.
multiscale_mode (str) – Either “range” or “value”.
ratio_range (tuple[float]) – (min_ratio, max_ratio)
keep_ratio (bool) – Whether to keep the aspect ratio when resizing the image.
bbox_clip_border (bool, optional) – Whether clip the objects outside the border of the image. Defaults to True.
backend (str) – Image resize backend, choices are ‘cv2’ and ‘pillow’. These two backends generates slightly different results. Defaults to ‘cv2’.
override (bool, optional) – Whether to override scale and scale_factor so as to call resize twice. Default False. If True, after the first resizing, the existed scale and scale_factor will be ignored so the second resizing can be allowed. This option is a work-around for multiple times of resize in DETR. Defaults to False.
- __init__(img_scale=None, multiscale_mode='range', ratio_range=None, keep_ratio=True, bbox_clip_border=True, backend='cv2', override=False)[source]¶
Initialize self. See help(type(self)) for accurate signature.
- static random_select(img_scales)[source]¶
Randomly select an img_scale from given candidates.
- Parameters
img_scales (list[tuple]) – Images scales for selection.
- Returns
Returns a tuple
(img_scale, scale_dix), whereimg_scaleis the selected image scale andscale_idxis the selected index in the given candidates.- Return type
(tuple, int)
- static random_sample(img_scales)[source]¶
Randomly sample an img_scale when
multiscale_mode=='range'.- Parameters
img_scales (list[tuple]) – Images scale range for sampling. There must be two tuples in img_scales, which specify the lower and upper bound of image scales.
- Returns
Returns a tuple
(img_scale, None), whereimg_scaleis sampled scale and None is just a placeholder to be consistent withrandom_select().- Return type
(tuple, None)
- static random_sample_ratio(img_scale, ratio_range)[source]¶
Randomly sample an img_scale when
ratio_rangeis specified.A ratio will be randomly sampled from the range specified by
ratio_range. Then it would be multiplied withimg_scaleto generate sampled scale.- Parameters
img_scale (tuple) – Images scale base to multiply with ratio.
ratio_range (tuple[float]) – The minimum and maximum ratio to scale the
img_scale.
- Returns
Returns a tuple
(scale, None), wherescaleis sampled ratio multiplied withimg_scaleand None is just a placeholder to be consistent withrandom_select().- Return type
(tuple, None)
- class easycv.datasets.detection.pipelines.mm_transforms.MMRandomFlip(flip_ratio=None, direction='horizontal')[source]¶
Bases:
objectFlip the image & bbox & mask.
If the input dict contains the key “flip”, then the flag will be used, otherwise it will be randomly decided by a ratio specified in the init method.
When random flip is enabled,
flip_ratio/directioncan either be a float/string or tuple of float/string. There are 3 flip modes:flip_ratiois float,directionis string: the image will bedirection``ly flipped with probability of ``flip_ratio. E.g.,flip_ratio=0.5,direction='horizontal', then image will be horizontally flipped with probability of 0.5.
flip_ratiois float,directionis list of string: the image wilbe
direction[i]``ly flipped with probability of ``flip_ratio/len(direction). E.g.,flip_ratio=0.5,direction=['horizontal', 'vertical'], then image will be horizontally flipped with probability of 0.25, vertically with probability of 0.25.
flip_ratiois list of float,directionis list of string:given
len(flip_ratio) == len(direction), the image wil bedirection[i]``ly flipped with probability of ``flip_ratio[i]. E.g.,flip_ratio=[0.3, 0.5],direction=['horizontal', 'vertical'], then image will be horizontally flipped with probability of 0.3, vertically with probability of 0.5.
- Parameters
flip_ratio (float | list[float], optional) – The flipping probability. Default: None.
direction (str | list[str], optional) – The flipping direction. Options are ‘horizontal’, ‘vertical’, ‘diagonal’. Default: ‘horizontal’. If input is a list, the length must equal
flip_ratio. Each element inflip_ratioindicates the flip probability of corresponding direction.
- __init__(flip_ratio=None, direction='horizontal')[source]¶
Initialize self. See help(type(self)) for accurate signature.
- bbox_flip(bboxes, img_shape, direction)[source]¶
Flip bboxes horizontally.
- Parameters
bboxes (numpy.ndarray) – Bounding boxes, shape (…, 4*k)
img_shape (tuple[int]) – Image shape (height, width)
direction (str) – Flip direction. Options are ‘horizontal’, ‘vertical’.
- Returns
Flipped bounding boxes.
- Return type
numpy.ndarray
- class easycv.datasets.detection.pipelines.mm_transforms.MMPad(size=None, size_divisor=None, pad_to_square=False, pad_val=0)[source]¶
Bases:
objectPad the image & mask.
There are two padding modes: (1) pad to a fixed size and (2) pad to the minimum size that is divisible by some number. Added keys are “pad_shape”, “pad_fixed_size”, “pad_size_divisor”,
- Parameters
size (tuple, optional) – Fixed padding size.
size_divisor (int, optional) – The divisor of padded size.
pad_to_square (bool) – Whether to pad the image into a square. Currently only used for YOLOX. Default: False.
pad_val (float, optional) – Padding value, 0 by default.
- class easycv.datasets.detection.pipelines.mm_transforms.MMNormalize(mean, std, to_rgb=True)[source]¶
Bases:
objectNormalize the image.
Added key is “img_norm_cfg”.
- Parameters
mean (sequence) – Mean values of 3 channels.
std (sequence) – Std values of 3 channels.
to_rgb (bool) – Whether to convert the image from BGR to RGB, default is true.
- class easycv.datasets.detection.pipelines.mm_transforms.LoadImageFromFile(to_float32=False, color_type='color', file_client_args={'backend': 'disk'})[source]¶
Bases:
objectLoad an image from file.
Required keys are “img_prefix” and “img_info” (a dict that must contain the key “filename”). Added or updated keys are “filename”, “img”, “img_shape”, “ori_shape” (same as img_shape), “pad_shape” (same as img_shape), “scale_factor” (1.0) and “img_norm_cfg” (means=0 and stds=1).
- Parameters
to_float32 (bool) – Whether to convert the loaded image to a float32 numpy array. If set to False, the loaded image is an uint8 array. Defaults to False.
color_type (str) – The flag argument for
mmcv.imfrombytes(). Defaults to ‘color’.file_client_args (dict) – Arguments to instantiate a FileClient. See
mmcv.fileio.FileClientfor details. Defaults todict(backend='disk').
- class easycv.datasets.detection.pipelines.mm_transforms.LoadMultiChannelImageFromFiles(to_float32=False, color_type='unchanged', file_client_args={'backend': 'disk'})[source]¶
Bases:
objectLoad multi-channel images from a list of separate channel files.
Required keys are “img_prefix” and “img_info” (a dict that must contain the key “filename”, which is expected to be a list of filenames). Added or updated keys are “filename”, “img”, “img_shape”, “ori_shape” (same as img_shape), “pad_shape” (same as img_shape), “scale_factor” (1.0) and “img_norm_cfg” (means=0 and stds=1).
- Parameters
to_float32 (bool) – Whether to convert the loaded image to a float32 numpy array. If set to False, the loaded image is an uint8 array. Defaults to False.
color_type (str) – The flag argument for
mmcv.imfrombytes(). Defaults to ‘color’.file_client_args (dict) – Arguments to instantiate a FileClient. See
mmcv.fileio.FileClientfor details. Defaults todict(backend='disk').
- class easycv.datasets.detection.pipelines.mm_transforms.LoadAnnotations(with_bbox=True, with_label=True, with_mask=False, with_seg=False, poly2mask=True, file_client_args={'backend': 'disk'})[source]¶
Bases:
objectLoad multiple types of annotations.
- Parameters
with_bbox (bool) – Whether to parse and load the bbox annotation. Default: True.
with_label (bool) – Whether to parse and load the label annotation. Default: True.
with_mask (bool) – Whether to parse and load the mask annotation. Default: False.
with_seg (bool) – Whether to parse and load the semantic segmentation annotation. Default: False.
poly2mask (bool) – Whether to convert the instance masks from polygons to bitmaps. Default: True.
file_client_args (dict) – Arguments to instantiate a FileClient. See
mmcv.fileio.FileClientfor details. Defaults todict(backend='disk').
- class easycv.datasets.detection.pipelines.mm_transforms.MMMultiScaleFlipAug(transforms, img_scale=None, scale_factor=None, flip=False, flip_direction='horizontal')[source]¶
Bases:
objectTest-time augmentation with multiple scales and flipping.
An example configuration is as followed:
img_scale=[(1333, 400), (1333, 800)], flip=True, transforms=[ dict(type='Resize', keep_ratio=True), dict(type='RandomFlip'), dict(type='Normalize', **img_norm_cfg), dict(type='Pad', size_divisor=32), dict(type='ImageToTensor', keys=['img']), dict(type='Collect', keys=['img']), ]
After MultiScaleFLipAug with above configuration, the results are wrapped into lists of the same length as followed:
dict( img=[...], img_shape=[...], scale=[(1333, 400), (1333, 400), (1333, 800), (1333, 800)] flip=[False, True, False, True] ... )
- Parameters
transforms (list[dict]) – Transforms to apply in each augmentation.
img_scale (tuple | list[tuple] | None) – Images scales for resizing.
scale_factor (float | list[float] | None) – Scale factors for resizing.
flip (bool) – Whether apply flip augmentation. Default: False.
flip_direction (str | list[str]) – Flip augmentation directions, options are “horizontal”, “vertical” and “diagonal”. If flip_direction is a list, multiple flip augmentations will be applied. It has no effect when flip == False. Default: “horizontal”.