EasyCV DOCUMENTATION¶
EasyCV is an all-in-one computer vision toolbox based on PyTorch, mainly focus on self-supervised learning, image classification, metric-learning, object detection and so on.
Prepare Datasets¶
[Prepare Cifar](#Prepare Cifar)
[Prepare Imagenet](#Prepare Imagenet)
[Prepare Imagenet-TFrecords](#Prepare Imagenet-TFrecords)
[Prepare COCO](#Prepare COCO)
[Prepare PAI-Itag detection](#Prepare PAI-Itag detection)
Prepare Cifar¶
Download dataset cifar10 and uncompress files to data/cifar
, directory structure is as follows:
data/cifar
└── cifar-10-batches-py
├── batches.meta
├── data_batch_1
├── data_batch_2
├── data_batch_3
├── data_batch_4
├── data_batch_5
├── readme.html
├── read.py
└── test_batch
Prepare Imagenet¶
Go to the download-url, Register an account and log in .
Download the following files:
Training images (Task 1 & 2). 138GB.
Validation images (all tasks). 6.3GB.
Unzip the downloaded file.
Using this scrip to get data meta.
Prepare Imagenet-TFrecords¶
Go to the download-url, Register an account and log in .
The dataset is divided into two parts, part0 (79GB) and part1 (75GB), you need download all of them.
Prepare COCO¶
Download COCO2017 dataset to data/coco
, directory structure is as follows
data/coco
├── annotations
├── train2017
└── val2017
Prepare PAI-Itag detection¶
Download SmallCOCO dataset to data/coco
,
directory structure is as follows:
data/coco/
├── train2017
├── train2017_20_local.manifest
├── val2017
└── val2017_20_local.manifest
replace train_data and val_data path in config file
sed -i 's#train2017.manifest#train2017_20_local.manifest#g' configs/detection/yolox_coco_pai.py
sed -i 's#val2017.manifest#val2017_20_local.manifest#g' configs/detection/yolox_coco_pai.py
Quick Start¶
Prerequisites¶
python >= 3.6
Pytorch >= 1.5
mmcv >= 1.2.0
nvidia-dali == 0.25.0
Installation¶
Prepare environment¶
Create a conda virtual environment and activate it.
conda create -n ev python=3.6 -y conda activate ev
Install PyTorch and torchvision
The master branch works with PyTorch 1.5.1 or higher.
conda install pytorch==1.7.0 torchvision==0.8.0 -c pytorch
Install some python dependencies
replace {cu_version} and {torch_version} to the version used in your environment
# install mmcv pip install mmcv-full==1.4.4 -f https://download.openmmlab.com/mmcv/dist/{cu_version}/{torch_version}/index.html # for example, install mmcv-full for cuda10.1 and pytorch 1.7.0 pip install mmcv-full==1.4.4 -f https://download.openmmlab.com/mmcv/dist/cu101/torch1.7.0/index.html # install nvidia-dali pip install http://pai-vision-data-hz.cn-hangzhou.oss-cdn.aliyun-inc.com/third_party/nvidia_dali_cuda100-0.25.0-1535750-py3-none-manylinux2014_x86_64.whl # install common_io for MaxCompute table read (optional) pip install https://tfsmoke1.oss-cn-zhangjiakou.aliyuncs.com/tunnel_paiio/common_io/py3/common_io-0.1.0-cp36-cp36m-linux_x86_64.whl
Install EasyCV
You can simply install easycv with the following command:
pip install pai-easycv
or clone the repository and then install it:
git clone https://github.com/Alibaba/EasyCV.git cd easycv pip install -r requirements.txt pip install -v -e . # or "python setup.py develop"
Install pai_nni and blade_compressin
When you use model quantize and prune, you need to install pai_nni and blade_compressin with the following command:
# install torch >= 1.8.0 pip install torch==1.8.0 torchvision==0.9.0 torchaudio==0.8.0 # install mmcv >= 1.3.0 (torch version >= 1.8.0 does not support mmcv version < 1.3.0) pip install mmcv-full==1.4.4 -f https://download.openmmlab.com/mmcv/dist/{cu_version}/{torch_version}/index.html # install onnx and pai_nni pip install onnx pip install https://pai-nni.oss-cn-zhangjiakou.aliyuncs.com/release/2.5/pai_nni-2.5-py3-none-manylinux1_x86_64.whl # install blade_compression pip install http://pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com/third_party/blade_compression-0.0.1-py3-none-any.whl
Verification¶
Simple verification
```python
from easycv.apis import *
```
You can also verify your installation using follwing quick-start examples
Self-supervised Learning Model Zoo¶
Pretrained models¶
MAE¶
Pretrained on ImageNet dataset.
Config | Epochs | Download |
---|---|---|
mae_vit_base_patch16_8xb64_400e | 400 | model |
mae_vit_base_patch16_8xb64_1600e | 1600 | model |
mae_vit_large_patch16_8xb32_1600e | 1600 | model |
DINO¶
Pretrained on ImageNet dataset.
Config | Epochs | Download |
---|---|---|
dino_deit_small_p16_8xb32_100e | 100 | model - log |
MoBY¶
Pretrained on ImageNet dataset.
Config | Epochs | Download |
---|---|---|
moby_deit_small_p16_4xb128_300e | 300 | model - log |
MoCo V2¶
Pretrained on ImageNet dataset.
Config | Epochs | Download |
---|---|---|
mocov2_resnet50_8xb32_200e | 200 | model - log |
SwAV¶
Pretrained on ImageNet dataset.
Config | Epochs | Download |
---|---|---|
swav_resnet50_8xb32_200e | 200 | model - log |
Benchmarks¶
For detailed usage of benchmark tools, please refer to benchmark README.md.
ImageNet Linear Evaluation¶
Algorithm | Linear Eval Config | Pretrained Config | Top-1 (%) | Download |
---|---|---|---|---|
SwAV | swav_resnet50_8xb2048_20e_feature | swav_resnet50_8xb32_200e | 73.618 | log |
DINO | dino_deit_small_p16_8xb2048_20e_feature | dino_deit_small_p16_8xb32_100e | 71.248 | log |
MoBY | moby_deit_small_p16_8xb2048_30e_feature | moby_deit_small_p16_4xb128_300e | 72.214 | log |
MoCo-v2 | mocov2_resnet50_8xb2048_40e_feature | mocov2_resnet50_8xb32_200e | 66.8 | log |
ImageNet Finetuning¶
Algorithm | Fintune Config | Pretrained Config | Top-1 (%) | Download |
---|---|---|---|---|
MAE | mae_vit_base_patch16_8xb64_100e_lrdecay075_fintune | mae_vit_base_patch16_8xb64_400e | 83.13 | fintune model - log |
mae_vit_base_patch16_8xb64_100e_lrdecay065_fintune | mae_vit_base_patch16_8xb64_1600e | 83.49 | fintune model - log | |
mae_vit_large_patch16_8xb16_50e_lrdecay075_fintune | mae_vit_large_patch16_8xb32_1600e | 85.49 | fintune model - log |
Detection Model Zoo¶
YOLOX¶
Pretrained on COCO2017 dataset.
Algorithm | Config | mAPval 0.5:0.95 |
APval 50 |
Download |
---|---|---|---|---|
YOLOX-s | yolox_s_8xb16_300e_coco | 40.0 | 58.9 | model - log |
YOLOX-m | yolox_m_8xb16_300e_coco | 46.3 | 64.9 | model - log |
YOLOX-l | yolox_l_8xb8_300e_coco | 48.9 | 67.5 | model - log |
YOLOX-x | yolox_x_8xb8_300e_coco | 50.9 | 69.2 | model - log |
YOLOX-tiny | yolox_tiny_8xb16_300e_coco | 31.5 | 49.2 | model - log |
YOLOX-nano | yolox_nano_8xb16_300e_coco | 26.5 | 42.6 | model - log |
Develop¶
1. Code Style¶
We adopt PEP8 as the preferred code style.
We use the following toolsseed isortseed isortseed isort for linting and formatting:
Style configurations of yapf and isort can be found in setup.cfg.
We use pre-commit hook that checks and formats for flake8
, yapf
, seed-isort-config
, isort
, trailing whitespaces
,
fixes end-of-files
, sorts requirments.txt
automatically on every commit.
The config for a pre-commit hook is stored in .pre-commit-config.
After you clone the repository, you will need to install initialize pre-commit hook.
pip install -r requirements/tests.txt
From the repository folder
pre-commit install
After this on every commit check code linters and formatter will be enforced.
If you want to use pre-commit to check all the files, you can run
pre-commit run --all-files
If you only want to format and lint your code, you can run
sh scripts/linter.sh
2. Test¶
2.1 Unit test¶
bash scripts/ci_test.sh
2.2 Test data¶
if you add new data, please do the following to commit it to git-lfs before “git commit”:
python git-lfs/git_lfs.py add data/test/new_data
python git-lfs/git_lfs.py push
3. Build pip package¶
python setup.py sdist bdist_wheel
self-supervised learning tutorial¶
Data Preparation¶
To download the dataset, please refer to prepare_data.md.
Self-supervised learning support imagenet(raw and tfrecord) format data.
Imagenet format¶
You can download Imagenet data or use your own unlabeld image data. You should provide a directory which contains images for self-supervised training and a filelist which contains image path to the root directory. For example, the image directory is as follows
images/
├── 0001.jpg
├── 0002.jpg
├── 0003.jpg
|...
└── 9999.jpg
the content of filelist is
0001.jpg
0002.jpg
0003.jpg
...
9999.jpg
Local & PAI-DSW¶
We use configs/selfsup/mocov2/mocov2_rn50_8xb32_200e_jpg.py as an example config in which two config variable should be modified
data_train_list = 'filelist.txt'
data_train_root = 'images'
Training¶
Single gpu:
python tools/train.py \
${CONFIG_PATH} \
--work_dir ${WORK_DIR}
Multi gpus:
bash tools/dist_train.sh \
${NUM_GPUS} \
${CONFIG_PATH} \
--work_dir ${WORK_DIR}
Arguments
NUM_GPUS
: number of gpusCONFIG_PATH
: the config file path of a selfsup methodWORK_DIR
: your path to save models and logs
Examples:
Edit data_root
path in the ${CONFIG_PATH}
to your own data path.
GPUS=8
bash tools/dist_train.sh configs/selfsup/mocov2/mocov2_rn50_8xb32_200e_jpg.py $GPUS
Export model¶
python tools/export.py \
${CONFIG_PATH} \
${CHECKPOINT} \
${EXPORT_PATH}
Arguments
CONFIG_PATH
: the config file path of a selfsup methodCHECKPOINT
:your checkpoint file of a selfsup method named as epoch_*.pthEXPORT_PATH
: your path to save export model
Examples:
python tools/export.py configs/selfsup/mocov2/mocov2_rn50_8xb32_200e_jpg.py \
work_dirs/selfsup/mocov2/epoch_200.pth \
work_dirs/selfsup/mocov2/epoch_200_export.pth
Feature extract¶
Download test_image
import cv2
from easycv.predictors.feature_extractor import TorchFeatureExtractor
output_ckpt = 'work_dirs/selfsup/mocov2/epoch_200_export.pth'
fe = TorchFeatureExtractor(output_ckpt)
img = cv2.imread('248347732153_1040.jpg')
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
feature = fe.predict([img])
print(feature[0]['feature'].shape)
yolox tutorial¶
Data preparation¶
To download the dataset, please refer to prepare_data.md.
Yolox support both coco format and PAI-Itag detection format,
COCO format¶
To use coco data to train detection, you can refer to configs/detection/yolox/yolox_s_8xb16_300e_coco.py for more configuration details.
PAI-Itag detection format¶
To use pai-itag detection format data to train detection, you can refer to configs/detection/yolox/yolox_s_8xb16_300e_coco_pai.py for more configuration details.
Local & PAI-DSW¶
To use COCO format data, use config file configs/detection/yolox/yolox_s_8xb16_300e_coco.py
To use PAI-Itag format data, use config file configs/detection/yolox/yolox_s_8xb16_300e_coco_pai.py
Train¶
Single gpu:
python tools/train.py \
${CONFIG_PATH} \
--work_dir ${WORK_DIR}
Multi gpus:
bash tools/dist_train.sh \
${NUM_GPUS} \
${CONFIG_PATH} \
--work_dir ${WORK_DIR}
Arguments
NUM_GPUS
: number of gpusCONFIG_PATH
: the config file path of a detection methodWORK_DIR
: your path to save models and logs
Examples:
Edit data_root
path in the ${CONFIG_PATH}
to your own data path.
GPUS=8
bash tools/dist_train.sh configs/detection/yolox/yolox_s_8xb16_300e_coco.py $GPUS
Evaluation¶
Single gpu:
python tools/eval.py \
${CONFIG_PATH} \
${CHECKPOINT} \
--eval
Multi gpus:
bash tools/dist_test.sh \
${CONFIG_PATH} \
${NUM_GPUS} \
${CHECKPOINT} \
--eval
Arguments
CONFIG_PATH
: the config file path of a detection methodNUM_GPUS
: number of gpusCHECKPOINT
: the checkpoint file named as epoch_*.pth.
Examples:
GPUS=8
bash tools/dist_test.sh configs/detection/yolox/yolox_s_8xb16_300e_coco.py $GPUS work_dirs/detection/yolox/epoch_300.pth --eval
Export model¶
python tools/export.py \
${CONFIG_PATH} \
${CHECKPOINT} \
${EXPORT_PATH}
Arguments
CONFIG_PATH
: the config file path of a detection methodCHECKPOINT
:your checkpoint file of a detection method named as epoch_*.pth.EXPORT_PATH
: your path to save export model
Examples:
python tools/export.py configs/detection/yolox/yolox_s_8xb16_300e_coco.py \
work_dirs/detection/yolox/epoch_300.pth \
work_dirs/detection/yolox/epoch_300_export.pth
Inference¶
Download test_image
import cv2
from easycv.predictors import TorchYoloXPredictor
output_ckpt = 'work_dirs/detection/yolox/epoch_300.pth'
detector = TorchYoloXPredictor(output_ckpt)
img = cv2.imread('000000017627.jpg')
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
output = detector.predict([img])
print(output)
# visualize image
from matplotlib import pyplot as plt
image = img.copy()
for box, cls_name in zip(output[0]['detection_boxes'], output[0]['detection_class_names']):
# box is [x1,y1,x2,y2]
box = [int(b) for b in box]
image = cv2.rectangle(image, tuple(box[:2]), tuple(box[2:4]), (0,255,0), 2)
cv2.putText(image, cls_name, (box[0], box[1]-5), cv2.FONT_HERSHEY_SIMPLEX, 1.0, (0,0,255), 2)
plt.imshow(image)
plt.show()
image classification tutorial¶
Data Preparation¶
To download the dataset, please refer to prepare_data.md.
Image classification support cifar and imagenet(raw and tfrecord) format data.
Cifar¶
To use Cifar data to train classification, you can refer to configs/classification/cifar10/swintiny_b64_5e_jpg.py for more configuration details.
Imagenet format¶
You can also use your self-defined data which follows imagenet format
, you should provide a root directory which condatains images for classification training and a filelist which contains image path to the root directory. For example, the image root directory is as follows
images/
├── 0001.jpg
├── 0002.jpg
├── 0003.jpg
|...
└── 9999.jpg
each line of the filelist consists of two parts, subpath to the image files starting from the image root directory, class label string for the corresponding image, which are seperated by space
0001.jpg label1
0002.jpg label2
0003.jpg label3
...
9999.jpg label9999
To use Imagenet format data to train classification, you can refer to configs/classification/imagenet/imagenet_rn50_jpg.py for more configuration details.
Local & PAI-DSW¶
Training¶
Single gpu:
python tools/train.py \
${CONFIG_PATH} \
--work_dir ${WORK_DIR}
Multi gpus:
bash tools/dist_train.sh \
${NUM_GPUS} \
${CONFIG_PATH} \
--work_dir ${WORK_DIR}
Arguments
NUM_GPUS
: number of gpusCONFIG_PATH
: the config file path of a image classification methodWORK_DIR
: your path to save models and logs
Examples:
Edit data_root
path in the ${CONFIG_PATH}
to your own data path.
single gpu training:
```shell
python tools/train.py configs/classification/cifar10/swintiny_b64_5e_jpg.py --work_dir work_dirs/classification/cifar10/swintiny --fp16
```
multi gpu training
```shell
GPUS=8
bash tools/dist_train.sh configs/classification/cifar10/swintiny_b64_5e_jpg.py $GPUS --fp16
```
training using python api
```python
import easycv.tools
import os
# config_path can be a local file or http url
config_path = 'configs/classification/cifar10/swintiny_b64_5e_jpg.py'
easycv.tools.train(config_path, gpus=8, fp16=False, master_port=29527)
```
Evaluation¶
Single gpu:
python tools/eval.py \
${CONFIG_PATH} \
${CHECKPOINT} \
--eval
Multi gpus:
bash tools/dist_test.sh \
${CONFIG_PATH} \
${NUM_GPUS} \
${CHECKPOINT} \
--eval
Arguments
CONFIG_PATH
: the config file path of a image classification methodNUM_GPUS
: number of gpusCHECKPOINT
: the checkpoint file named as epoch_*.pth
Examples:
single gpu evaluation
```shell
python tools/eval.py configs/classification/cifar10/swintiny_b64_5e_jpg.py work_dirs/classification/cifar10/swintiny/epoch_350.pth --eval --fp16
```
multi-gpu evaluation
```shell
GPUS=8
bash tools/dist_test.sh configs/classification/cifar10/swintiny_b64_5e_jpg.py $GPUS work_dirs/classification/cifar10/swintiny/epoch_350.pth --eval --fp16
```
evaluation using python api
```python
import easycv.tools
import os
os.environ['CUDA_VISIBLE_DEVICES']='3,4,5,6'
config_path = 'configs/classification/cifar10/swintiny_b64_5e_jpg.py'
checkpoint_path = 'work_dirs/classification/cifar10/swintiny/epoch_350.pth'
easycv.tools.eval(config_path, checkpoint_path, gpus=8)
```
Export model for inference¶
If SyncBN is configured, we should replace it with BN in config file
```python
# imagenet_rn50.py
model = dict(
...
backbone=dict(
...
norm_cfg=dict(type='BN')), # SyncBN --> BN
...)
```
```shell
python tools/export.py configs/classification/cifar10/swintiny_b64_5e_jpg.py \
work_dirs/classification/cifar10/swintiny/epoch_350.pth \
work_dirs/classification/cifar10/swintiny/epoch_350_export.pth
```
or using python api
```python
import easycv.tools
config_path = './imagenet_rn50.py'
checkpoint_path = 'oss://pai-vision-data-hz/pretrained_models/easycv/resnet/resnet50.pth'
export_path = './resnet50_export.pt'
easycv.tools.export(config_path, checkpoint_path, export_path)
```
Inference¶
Download [test_image](http://pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com/data/cifar10/qince_data/predict/aeroplane_s_000004.png)
```python
import cv2
from easycv.predictors.classifier import TorchClassifier
output_ckpt = 'work_dirs/classification/cifar10/swintiny/epoch_350_export.pth'
tcls = TorchClassifier(output_ckpt)
img = cv2.imread('aeroplane_s_000004.png')
# input image should be RGB order
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
output = tcls.predict([img])
print(output)
```
file tutorial¶
The file module of easycv supports operations both on local and oss files, oss introduction please refer to: https://www.aliyun.com/product/oss .
If you operate oss files, you need refer to [access_oss](# access_oss) to authorize oss first.
Support operations¶
access_oss¶
Authorize oss.
Method1:
from easycv.file import io
io.access_oss(
ak_id='your_accesskey_id',
ak_secret='your_accesskey_secret',
hosts='your endpoint' or ['your endpoint1', 'your endpoint2'],
buckets='your bucket' or ['your bucket1', 'your bucket2'])
Method2:
Add oss config to your local file ~/.ossutilconfig
, as follows:
More oss config information, please refer to: https://help.aliyun.com/document_detail/120072.html
[Credentials]
language = CH
endpoint = your endpoint
accessKeyID = your_accesskey_id
accessKeySecret = your_accesskey_secret
[Bucket-Endpoint]
bucket1 = endpoint1
bucket2 = endpoint2
Then run the following command, the config file will be read by default to authorize oss.
from easycv.file import io
io.access_oss()
open¶
Support w,wb, a, r, rb modes on oss path. Local path is the same usage as the python build-in open
.
Example for oss:
io.access_oss
please refer to [access_oss](# access_oss).
from easycv.file import io
io.access_oss('your oss config')
# Write something to a oss file.
with io.open('oss://bucket_name/demo.txt', 'w') as f:
f.write("test")
# Read from a oss file.
with io.open('oss://bucket_name/demo.txt', 'r') as f:
print(f.read())
Example for local:
from easycv.file import io
# Write something to a oss file.
with io.open('/your/local/path/demo.txt', 'w') as f:
f.write("test")
# Read from a oss file.
with io.open('/your/local/path/demo.txt', 'r') as f:
print(f.read())
exists¶
Whether the file exists, same usage as os.path.exists
. Support local path and oss path.
Example for oss:
io.access_oss
please refer to [access_oss](# access_oss).
from easycv.file import io
io.access_oss('your oss config')
ret = io.exists('oss://bucket_name/dir')
print(ret)
Example for Local:
from easycv.file import io
ret = io.exists('oss://bucket_name/dir')
print(ret)
move¶
Move src to dst, same usage as shutil.move. Support local path and oss path.
Example for oss:
io.access_oss
please refer to [access_oss](# access_oss).
from easycv.file import io
io.access_oss('your oss config')
# move oss file to local
io.move('oss://bucket_name/file.txt', '/your/local/path/file.txt')
# move oss file to oss
io.move('oss://bucket_name/dir1/file.txt', 'oss://bucket_name/dir2/file.txt')
# move local file to oss
io.move('/your/local/file.txt', 'oss://bucket_name/file.txt')
# move directory
io.move('oss://bucket_name/dir1/', 'oss://bucket_name/dir2/')
Example for local:
from easycv.file import io
# move local file to local
io.move('/your/local/path1/file.txt', '/your/local/path2/file.txt')
# move local dir to local
io.move('/your/local/dir1', '/your/local/dir2')
copy¶
Copy a file from src to dst. Same usage as shutil.copyfile
.If you want to copy a directory, please refert to [copytree](# copytree).
Example for oss:
io.access_oss
please refer to [access_oss](# access_oss).
from easycv.file import io
io.access_oss('your oss config')
# Copy a file from local to oss:
io.copy('/your/local/file.txt', 'oss://bucket/dir/file.txt')
# Copy a oss file to local:
io.copy('oss://bucket/dir/file.txt', '/your/local/file.txt')
# Copy a file from oss to oss::
io.copy('oss://bucket/dir/file.txt', 'oss://bucket/dir/file2.txt')
Example for local:
from easycv.file import io
# Copy a file from local to local:
io.copy('/your/local/path1/file.txt', '/your/local/path2/file.txt'')
copytree¶
Copy files recursively from src to dst. Same usage as shutil.copytree
.
If you want to copy a file, please use [copy](# copy).
Example for oss:
io.access_oss
please refer to [access_oss](# access_oss).
from easycv.file import io
io.access_oss('your oss config')
# copy files from local to oss
io.copytree(src='/your/local/dir1', dst='oss://bucket_name/dir2')
# copy files from oss to local
io.copytree(src='oss://bucket_name/dir2', dst='/your/local/dir1')
# copy files from oss to oss
io.copytree(src='oss://bucket_name/dir1', dst='oss://bucket_name/dir2')
Example for local:
from easycv.file import io
# copy files from local to local
io.copytree(src='/your/local/dir1', dst='/your/local/dir2')
listdir¶
List all objects in path. Same usage as os.listdir
.
Example for oss:
io.access_oss
please refer to [access_oss](# access_oss).
from easycv.file import io
io.access_oss('your oss config')
ret = io.listdir('oss://bucket/dir', recursive=True)
print(ret)
Example for local:
from easycv.file import io
ret = io.listdir('oss://bucket/dir', recursive=True)
print(ret)
remove¶
Remove a file or a directory recursively. Same usage as os.remove
or shutil.rmtree
.
Example for oss:
io.access_oss
please refer to [access_oss](# access_oss).
from easycv.file import io
io.access_oss('your oss config')
# Remove a oss file
io.remove('oss://bucket_name/file.txt')
# Remove a oss directory
io.remove('oss://bucket_name/dir/')
Example for local:
from easycv.file import io
# Remove a local file
io.remove('/your/local/path/file.txt')
# Remove a local directory
io.remove('/your/local/dir/')
rmtree¶
Remove directory recursively, same usage as shutil.rmtree
.
Example for oss:
io.access_oss
please refer to [access_oss](# access_oss).
from easycv.file import io
io.access_oss('your oss config')
io.remove('oss://bucket_name/dir_name/')
Example for local:
from easycv.file import io
io.remove('/your/local/dir/')
makedirs¶
Create directories recursively, same usage as os.makedirs
.
Example for oss:
io.access_oss
please refer to [access_oss](# access_oss).
from easycv.file import io
io.access_oss('your oss config')
io.makedirs('oss://bucket/new_dir/')
Example for local:
from easycv.file import io
io.makedirs('/your/local/new_dir/')
isdir¶
Return whether a path is directory, same usage as os.path.isdir
.
Example for oss:
io.access_oss
please refer to [access_oss](# access_oss).
from easycv.file import io
io.access_oss('your oss config') # only oss file need, refer to `IO.access_oss`
ret = io.isdir('oss://bucket/dir/')
print(ret)
Example for local:
from easycv.file import io
ret = io.isdir('your/local/dir/')
print(ret)
isfile¶
Return whether a path is file object, same usage as os.path.isfile
.
Example for oss:
io.access_oss
please refer to [access_oss](# access_oss).
from easycv.file import io
io.access_oss('your oss config')
ret = io.isfile('oss://bucket/file.txt')
print(ret)
Example for local:
from easycv.file import io
ret = io.isfile('/your/local/path/file.txt')
print(ret)
glob¶
Return a list of paths matching a pathname pattern.
Example for oss:
io.access_oss
please refer to [access_oss](# access_oss).
from easycv.file import io
io.access_oss('your oss config')
ret = io.glob('oss://bucket/dir/*.txt')
print(ret)
Example for local:
from easycv.file import io
ret = io.glob('/your/local/dir/*.txt')
print(ret)
size¶
Get the size of file path, same usage as os.path.getsize
.
Example for oss:
io.access_oss
please refer to [access_oss](# access_oss).
from easycv.file import io
io.access_oss('your oss config')
size = io.size('oss://bucket/file.txt')
print(size)
Example for local:
from easycv.file import io
size = io.size('/your/local/path/file.txt')
print(size)
v 0.2.2 (07/04/2022)¶
initial commit & first release
SOTA SSL Algorithms
EasyCV provides state-of-the-art algorithms in self-supervised learning based on contrastive learning such as SimCLR, MoCO V2, Swav, DINO and also MAE based on masked image modeling. We also provides standard benchmark tools for ssl model evaluation.
Vision Transformers
EasyCV aims to provide plenty vision transformer models trained either using supervised learning or self-supervised learning, such as ViT, Swin-Transformer and XCit. More models will be added in the future.
Functionality & Extensibility
In addition to SSL, EasyCV also support image classification, object detection, metric learning, and more area will be supported in the future. Although convering different area, EasyCV decompose the framework into different componets such as dataset, model, running hook, making it easy to add new compoenets and combining it with existing modules. EasyCV provide simple and comprehensive interface for inference. Additionaly, all models are supported on PAI-EAS, which can be easily deployed as online service and support automatic scaling and service moniting.
Efficiency
EasyCV support multi-gpu and multi worker training. EasyCV use DALI to accelerate data io and preprocessing process, and use fp16 to accelerate training process. For inference optimization, EasyCV export model using jit script, which can be optimized by PAI-Blade.
easycv.apis package¶
Submodules¶
easycv.apis.export module¶
easycv.apis.test module¶
- easycv.apis.test.single_cpu_test(model, data_loader, mode='test', show=False, out_dir=None, show_score_thr=0.3, **kwargs)[source]¶
- easycv.apis.test.single_gpu_test(model, data_loader, mode='test', use_fp16=False, **kwargs)[source]¶
Test model with single.
This method tests model with single
- Parameters
model (str) – Model to be tested.
data_loader (nn.Dataloader) – Pytorch data loader.
model – mode for model to forward
use_fp16 – Use fp16 inference
- Returns
The prediction results.
- Return type
list
- easycv.apis.test.multi_gpu_test(model, data_loader, mode='test', tmpdir=None, gpu_collect=False, use_fp16=False, **kwargs)[source]¶
Test model with multiple gpus.
This method tests model with multiple gpus and collects the results under two different modes: gpu and cpu modes. By setting ‘gpu_collect=True’ it encodes results to gpu tensors and use gpu communication for results collection. On cpu mode it saves the results on different gpus to ‘tmpdir’ and collects them by the rank 0 worker.
- Parameters
model (str) – Model to be tested.
data_loader (nn.Dataloader) – Pytorch data loader.
model – mode for model to forward
tmpdir (str) – Path of directory to save the temporary results from different gpus under cpu mode.
gpu_collect (bool) – Option to use either gpu or cpu to collect results.
use_fp16 – Use fp16 inference
- Returns
The prediction results.
- Return type
list
easycv.apis.train module¶
- easycv.apis.train.set_random_seed(seed, deterministic=False)[source]¶
Set random seed.
- Parameters
seed (int) – Seed to be used.
deterministic (bool) – Whether to set the deterministic option for CUDNN backend, i.e., set torch.backends.cudnn.deterministic to True and torch.backends.cudnn.benchmark to False. Default: False.
- easycv.apis.train.train_model(model, data_loaders, cfg, distributed=False, timestamp=None, meta=None, use_fp16=False, validate=True, gpu_collect=True)[source]¶
Training API.
- Parameters
model (
nn.Module
) – user defined modeldata_loaders – a list of dataloader for training data
cfg – config object
distributed – distributed training or not
timestamp – time str formated as ‘%Y%m%d_%H%M%S’
meta – a dict containing meta data info, such as env_info, seed, iter, epoch
use_fp16 – use fp16 training or not
validate – do evaluation while training
gpu_collect – use gpu collect or cpu collect for tensor gathering
- easycv.apis.train.build_optimizer(model, optimizer_cfg)[source]¶
Build optimizer from configs.
- Parameters
model (
nn.Module
) – The model with parameters to be optimized.optimizer_cfg (dict) –
The config dict of the optimizer.
- Positional fields are:
type: class name of the optimizer.
lr: base learning rate.
- Optional fields are:
any arguments of the corresponding optimizer type, e.g., weight_decay, momentum, etc.
paramwise_options: a dict with regular expression as keys to match parameter names and a dict containing options as values. Options include 6 fields: lr, lr_mult, momentum, momentum_mult, weight_decay, weight_decay_mult.
- Returns
The initialized optimizer.
- Return type
torch.optim.Optimizer
Example
>>> model = torch.nn.modules.Conv1d(1, 1, 1) >>> paramwise_options = { >>> '(bn|gn)(\d+)?.(weight|bias)': dict(weight_decay_mult=0.1), >>> '\Ahead.': dict(lr_mult=10, momentum=0)} >>> optimizer_cfg = dict(type='SGD', lr=0.01, momentum=0.9, >>> weight_decay=0.0001, >>> paramwise_options=paramwise_options) >>> optimizer = build_optimizer(model, optimizer_cfg)
easycv.datasets package¶
Subpackages¶
easycv.datasets.classification package¶
- class easycv.datasets.classification.ClsDataset(data_source, pipeline)[source]¶
Bases:
Generic
[torch.utils.data.dataset.T_co
]Dataset for classification
- Parameters
data_source – data source to parse input data
pipeline – transforms list
- __init__(data_source, pipeline)[source]¶
Initialize self. See help(type(self)) for accurate signature.
- evaluate(results, evaluators, logger=None, topk=(1, 5))[source]¶
evaluate classification task
- Parameters
results – a dict of list of tensor, including prediction and groundtruth info, where prediction tensor is NxC,and the same with groundtruth labels.
evaluators – a list of evaluator
- Returns
a dict of float, different metric values
- Return type
eval_result
- visualize(results, vis_num=10, **kwargs)[source]¶
Visulaize the model output on validation data. :param results: A dictionary containing
class: List of length number of test images. img_metas: List of length number of test images,
dict of image meta info, containing filename, img_shape, origin_img_shape and so on.
- Parameters
vis_num – number of images visualized
- Returns: A dictionary containing
images: Visulaized images, list of np.ndarray. img_metas: List of length number of test images,
dict of image meta info, containing filename, img_shape, origin_img_shape and so on.
- class easycv.datasets.classification.ClsOdpsDataset(data_source, pipeline, image_key='url_image', label_key='label', **kwargs)[source]¶
Bases:
Generic
[torch.utils.data.dataset.T_co
]Dataset for rotation prediction
Subpackages¶
easycv.datasets.classification.data_sources package¶
- class easycv.datasets.classification.data_sources.ClsSourceCifar10(root, split)[source]¶
Bases:
object
- CLASSES = ['airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']¶
- class easycv.datasets.classification.data_sources.ClsSourceCifar100(root, split)[source]¶
Bases:
object
- CLASSES = None¶
- class easycv.datasets.classification.data_sources.ClsSourceImageListByClass(root, list_file, m_per_class=2, delimeter=' ', split_huge_listfile_byrank=False, cache_path='data/', max_try=20)[source]¶
Bases:
object
Get the same m_per_class samples by the label idx.
- Parameters
list_file – str / list(str), str means a input image list file path, this file contains records as image_path label in list_file list(str) means multi image list, each one contains some records as image_path label
root – str / list(str), root path for image_path, each list_file will need a root.
m_per_class – num of samples for each class.
delimeter – str, delimeter of each line in the list_file
split_huge_listfile_byrank – Adapt to the situation that the memory cannot fully load a huge amount of data list. If split, data list will be split to each rank.
cache_path – if split_huge_listfile_byrank is true, cache list_file will be saved to cache_path.
max_try – int, max try numbers of reading image
- class easycv.datasets.classification.data_sources.ClsSourceImageList(list_file, root='', delimeter=' ', split_huge_listfile_byrank=False, split_label_balance=False, cache_path='data/', max_try=20)[source]¶
Bases:
object
data source for classification
- Parameters
list_file – str / list(str), str means a input image list file path, this file contains records as image_path label in list_file list(str) means multi image list, each one contains some records as image_path label
root – str / list(str), root path for image_path, each list_file will need a root, if len(root) < len(list_file), we will use root[-1] to fill root list.
delimeter – str, delimeter of each line in the list_file
split_huge_listfile_byrank – Adapt to the situation that the memory cannot fully load a huge amount of data list. If split, data list will be split to each rank.
split_label_balance – if split_huge_listfile_byrank is true, whether split with label balance
cache_path – if split_huge_listfile_byrank is true, cache list_file will be saved to cache_path.
max_try – int, max try numbers of reading image
- class easycv.datasets.classification.data_sources.ClsSourceImageNetTFRecord(list_file='', root='', file_pattern=None, cache_path='data/cache/', max_try=10)[source]¶
Bases:
object
data source for imagenet tfrecord.
- class easycv.datasets.classification.data_sources.class_list.ClsSourceImageListByClass(root, list_file, m_per_class=2, delimeter=' ', split_huge_listfile_byrank=False, cache_path='data/', max_try=20)[source]¶
Bases:
object
Get the same m_per_class samples by the label idx.
- Parameters
list_file – str / list(str), str means a input image list file path, this file contains records as image_path label in list_file list(str) means multi image list, each one contains some records as image_path label
root – str / list(str), root path for image_path, each list_file will need a root.
m_per_class – num of samples for each class.
delimeter – str, delimeter of each line in the list_file
split_huge_listfile_byrank – Adapt to the situation that the memory cannot fully load a huge amount of data list. If split, data list will be split to each rank.
cache_path – if split_huge_listfile_byrank is true, cache list_file will be saved to cache_path.
max_try – int, max try numbers of reading image
- class easycv.datasets.classification.data_sources.fashiongen_h5.FashionGenH5(h5file_path, return_label=True, cache_path='data/fashionGenH5')[source]¶
Bases:
object
- class easycv.datasets.classification.data_sources.image_list.ClsSourceImageList(list_file, root='', delimeter=' ', split_huge_listfile_byrank=False, split_label_balance=False, cache_path='data/', max_try=20)[source]¶
Bases:
object
data source for classification
- Parameters
list_file – str / list(str), str means a input image list file path, this file contains records as image_path label in list_file list(str) means multi image list, each one contains some records as image_path label
root – str / list(str), root path for image_path, each list_file will need a root, if len(root) < len(list_file), we will use root[-1] to fill root list.
delimeter – str, delimeter of each line in the list_file
split_huge_listfile_byrank – Adapt to the situation that the memory cannot fully load a huge amount of data list. If split, data list will be split to each rank.
split_label_balance – if split_huge_listfile_byrank is true, whether split with label balance
cache_path – if split_huge_listfile_byrank is true, cache list_file will be saved to cache_path.
max_try – int, max try numbers of reading image
easycv.datasets.classification.pipelines package¶
- class easycv.datasets.classification.pipelines.MMAutoAugment(policies=[[{'type': 'Posterize', 'bits': 4, 'prob': 0.4}, {'type': 'Rotate', 'angle': 30.0, 'prob': 0.6}], [{'type': 'Solarize', 'thr': 113.77777777777777, 'prob': 0.6}, {'type': 'AutoContrast', 'prob': 0.6}], [{'type': 'Equalize', 'prob': 0.8}, {'type': 'Equalize', 'prob': 0.6}], [{'type': 'Posterize', 'bits': 5, 'prob': 0.6}, {'type': 'Posterize', 'bits': 5, 'prob': 0.6}], [{'type': 'Equalize', 'prob': 0.4}, {'type': 'Solarize', 'thr': 142.22222222222223, 'prob': 0.2}], [{'type': 'Equalize', 'prob': 0.4}, {'type': 'Rotate', 'angle': 26.666666666666668, 'prob': 0.8}], [{'type': 'Solarize', 'thr': 170.66666666666666, 'prob': 0.6}, {'type': 'Equalize', 'prob': 0.6}], [{'type': 'Posterize', 'bits': 6, 'prob': 0.8}, {'type': 'Equalize', 'prob': 1.0}], [{'type': 'Rotate', 'angle': 10.0, 'prob': 0.2}, {'type': 'Solarize', 'thr': 28.444444444444443, 'prob': 0.6}], [{'type': 'Equalize', 'prob': 0.6}, {'type': 'Posterize', 'bits': 5, 'prob': 0.4}], [{'type': 'Rotate', 'angle': 26.666666666666668, 'prob': 0.8}, {'type': 'ColorTransform', 'magnitude': 0.0, 'prob': 0.4}], [{'type': 'Rotate', 'angle': 30.0, 'prob': 0.4}, {'type': 'Equalize', 'prob': 0.6}], [{'type': 'Equalize', 'prob': 0.0}, {'type': 'Equalize', 'prob': 0.8}], [{'type': 'Invert', 'prob': 0.6}, {'type': 'Equalize', 'prob': 1.0}], [{'type': 'ColorTransform', 'magnitude': 0.4, 'prob': 0.6}, {'type': 'Contrast', 'magnitude': 0.8, 'prob': 1.0}], [{'type': 'Rotate', 'angle': 26.666666666666668, 'prob': 0.8}, {'type': 'ColorTransform', 'magnitude': 0.2, 'prob': 1.0}], [{'type': 'ColorTransform', 'magnitude': 0.8, 'prob': 0.8}, {'type': 'Solarize', 'thr': 56.888888888888886, 'prob': 0.8}], [{'type': 'Sharpness', 'magnitude': 0.7, 'prob': 0.4}, {'type': 'Invert', 'prob': 0.6}], [{'type': 'Shear', 'magnitude': 0.16666666666666666, 'prob': 0.6, 'direction': 'horizontal'}, {'type': 'Equalize', 'prob': 1.0}], [{'type': 'ColorTransform', 'magnitude': 0.0, 'prob': 0.4}, {'type': 'Equalize', 'prob': 0.6}], [{'type': 'Equalize', 'prob': 0.4}, {'type': 'Solarize', 'thr': 142.22222222222223, 'prob': 0.2}], [{'type': 'Solarize', 'thr': 113.77777777777777, 'prob': 0.6}, {'type': 'AutoContrast', 'prob': 0.6}], [{'type': 'Invert', 'prob': 0.6}, {'type': 'Equalize', 'prob': 1.0}], [{'type': 'ColorTransform', 'magnitude': 0.4, 'prob': 0.6}, {'type': 'Contrast', 'magnitude': 0.8, 'prob': 1.0}], [{'type': 'Equalize', 'prob': 0.8}, {'type': 'Equalize', 'prob': 0.6}]], hparams={'pad_val': 128})[source]¶
Bases:
object
Auto augmentation. This data augmentation is proposed in AutoAugment: Learning Augmentation Policies from Data. :param policies: The policies of auto augmentation. Each
policy in
policies
is a specific augmentation policy, and is composed by several augmentations (dict). When AutoAugment is called, a random policy inpolicies
will be selected to augment images.- Parameters
hparams (dict) – Configs of hyperparameters. Hyperparameters will be used in policies that require these arguments if these arguments are not set in policy dicts. Defaults to use _HPARAMS_DEFAULT.
- __init__(policies=[[{'type': 'Posterize', 'bits': 4, 'prob': 0.4}, {'type': 'Rotate', 'angle': 30.0, 'prob': 0.6}], [{'type': 'Solarize', 'thr': 113.77777777777777, 'prob': 0.6}, {'type': 'AutoContrast', 'prob': 0.6}], [{'type': 'Equalize', 'prob': 0.8}, {'type': 'Equalize', 'prob': 0.6}], [{'type': 'Posterize', 'bits': 5, 'prob': 0.6}, {'type': 'Posterize', 'bits': 5, 'prob': 0.6}], [{'type': 'Equalize', 'prob': 0.4}, {'type': 'Solarize', 'thr': 142.22222222222223, 'prob': 0.2}], [{'type': 'Equalize', 'prob': 0.4}, {'type': 'Rotate', 'angle': 26.666666666666668, 'prob': 0.8}], [{'type': 'Solarize', 'thr': 170.66666666666666, 'prob': 0.6}, {'type': 'Equalize', 'prob': 0.6}], [{'type': 'Posterize', 'bits': 6, 'prob': 0.8}, {'type': 'Equalize', 'prob': 1.0}], [{'type': 'Rotate', 'angle': 10.0, 'prob': 0.2}, {'type': 'Solarize', 'thr': 28.444444444444443, 'prob': 0.6}], [{'type': 'Equalize', 'prob': 0.6}, {'type': 'Posterize', 'bits': 5, 'prob': 0.4}], [{'type': 'Rotate', 'angle': 26.666666666666668, 'prob': 0.8}, {'type': 'ColorTransform', 'magnitude': 0.0, 'prob': 0.4}], [{'type': 'Rotate', 'angle': 30.0, 'prob': 0.4}, {'type': 'Equalize', 'prob': 0.6}], [{'type': 'Equalize', 'prob': 0.0}, {'type': 'Equalize', 'prob': 0.8}], [{'type': 'Invert', 'prob': 0.6}, {'type': 'Equalize', 'prob': 1.0}], [{'type': 'ColorTransform', 'magnitude': 0.4, 'prob': 0.6}, {'type': 'Contrast', 'magnitude': 0.8, 'prob': 1.0}], [{'type': 'Rotate', 'angle': 26.666666666666668, 'prob': 0.8}, {'type': 'ColorTransform', 'magnitude': 0.2, 'prob': 1.0}], [{'type': 'ColorTransform', 'magnitude': 0.8, 'prob': 0.8}, {'type': 'Solarize', 'thr': 56.888888888888886, 'prob': 0.8}], [{'type': 'Sharpness', 'magnitude': 0.7, 'prob': 0.4}, {'type': 'Invert', 'prob': 0.6}], [{'type': 'Shear', 'magnitude': 0.16666666666666666, 'prob': 0.6, 'direction': 'horizontal'}, {'type': 'Equalize', 'prob': 1.0}], [{'type': 'ColorTransform', 'magnitude': 0.0, 'prob': 0.4}, {'type': 'Equalize', 'prob': 0.6}], [{'type': 'Equalize', 'prob': 0.4}, {'type': 'Solarize', 'thr': 142.22222222222223, 'prob': 0.2}], [{'type': 'Solarize', 'thr': 113.77777777777777, 'prob': 0.6}, {'type': 'AutoContrast', 'prob': 0.6}], [{'type': 'Invert', 'prob': 0.6}, {'type': 'Equalize', 'prob': 1.0}], [{'type': 'ColorTransform', 'magnitude': 0.4, 'prob': 0.6}, {'type': 'Contrast', 'magnitude': 0.8, 'prob': 1.0}], [{'type': 'Equalize', 'prob': 0.8}, {'type': 'Equalize', 'prob': 0.6}]], hparams={'pad_val': 128})[source]¶
Initialize self. See help(type(self)) for accurate signature.
- class easycv.datasets.classification.pipelines.MMRandAugment(num_policies, magnitude_level, magnitude_std=0.0, total_level=30, policies=[{'type': 'AutoContrast'}, {'type': 'Equalize'}, {'type': 'Invert'}, {'type': 'Rotate', 'magnitude_key': 'angle', 'magnitude_range': (0, 30)}, {'type': 'Posterize', 'magnitude_key': 'bits', 'magnitude_range': (4, 0)}, {'type': 'Solarize', 'magnitude_key': 'thr', 'magnitude_range': (256, 0)}, {'type': 'SolarizeAdd', 'magnitude_key': 'magnitude', 'magnitude_range': (0, 110)}, {'type': 'ColorTransform', 'magnitude_key': 'magnitude', 'magnitude_range': (0, 0.9)}, {'type': 'Contrast', 'magnitude_key': 'magnitude', 'magnitude_range': (0, 0.9)}, {'type': 'Brightness', 'magnitude_key': 'magnitude', 'magnitude_range': (0, 0.9)}, {'type': 'Sharpness', 'magnitude_key': 'magnitude', 'magnitude_range': (0, 0.9)}, {'type': 'Shear', 'magnitude_key': 'magnitude', 'magnitude_range': (0, 0.3), 'direction': 'horizontal'}, {'type': 'Shear', 'magnitude_key': 'magnitude', 'magnitude_range': (0, 0.3), 'direction': 'vertical'}, {'type': 'Translate', 'magnitude_key': 'magnitude', 'magnitude_range': (0, 0.45), 'direction': 'horizontal'}, {'type': 'Translate', 'magnitude_key': 'magnitude', 'magnitude_range': (0, 0.45), 'direction': 'vertical'}], hparams={'pad_val': 128})[source]¶
Bases:
object
Random augmentation. This data augmentation is proposed in RandAugment: Practical automated data augmentation with a reduced search space. :param policies: The policies of random augmentation. Each
policy in
policies
is one specific augmentation policy (dict). The policy shall at least have key type, indicating the type of augmentation. For those which have magnitude, (given to the fact they are named differently in different augmentation, ) magnitude_key and magnitude_range shall be the magnitude argument (str) and the range of magnitude (tuple in the format of (val1, val2)), respectively. Note that val1 is not necessarily less than val2.- Parameters
num_policies (int) – Number of policies to select from policies each time.
magnitude_level (int | float) – Magnitude level for all the augmentation selected.
total_level (int | float) – Total level for the magnitude. Defaults to 30.
magnitude_std (Number | str) –
Deviation of magnitude noise applied. - If positive number, magnitude is sampled from normal distribution
(mean=magnitude, std=magnitude_std).
If 0 or negative number, magnitude remains unchanged.
If str “inf”, magnitude is sampled from uniform distribution (range=[min, magnitude]).
hparams (dict) – Configs of hyperparameters. Hyperparameters will be used in policies that require these arguments if these arguments are not set in policy dicts. Defaults to use _HPARAMS_DEFAULT.
Note
magnitude_std will introduce some randomness to policy, modified by https://github.com/rwightman/pytorch-image-models. When magnitude_std=0, we calculate the magnitude as follows: .. math:
\text{magnitude} = \frac{\text{magnitude\_level}} {\text{total\_level}} \times (\text{val2} - \text{val1}) + \text{val1}
- __init__(num_policies, magnitude_level, magnitude_std=0.0, total_level=30, policies=[{'type': 'AutoContrast'}, {'type': 'Equalize'}, {'type': 'Invert'}, {'type': 'Rotate', 'magnitude_key': 'angle', 'magnitude_range': (0, 30)}, {'type': 'Posterize', 'magnitude_key': 'bits', 'magnitude_range': (4, 0)}, {'type': 'Solarize', 'magnitude_key': 'thr', 'magnitude_range': (256, 0)}, {'type': 'SolarizeAdd', 'magnitude_key': 'magnitude', 'magnitude_range': (0, 110)}, {'type': 'ColorTransform', 'magnitude_key': 'magnitude', 'magnitude_range': (0, 0.9)}, {'type': 'Contrast', 'magnitude_key': 'magnitude', 'magnitude_range': (0, 0.9)}, {'type': 'Brightness', 'magnitude_key': 'magnitude', 'magnitude_range': (0, 0.9)}, {'type': 'Sharpness', 'magnitude_key': 'magnitude', 'magnitude_range': (0, 0.9)}, {'type': 'Shear', 'magnitude_key': 'magnitude', 'magnitude_range': (0, 0.3), 'direction': 'horizontal'}, {'type': 'Shear', 'magnitude_key': 'magnitude', 'magnitude_range': (0, 0.3), 'direction': 'vertical'}, {'type': 'Translate', 'magnitude_key': 'magnitude', 'magnitude_range': (0, 0.45), 'direction': 'horizontal'}, {'type': 'Translate', 'magnitude_key': 'magnitude', 'magnitude_range': (0, 0.45), 'direction': 'vertical'}], hparams={'pad_val': 128})[source]¶
Initialize self. See help(type(self)) for accurate signature.
- class easycv.datasets.classification.pipelines.MMRandomErasing(erase_prob=0.5, min_area_ratio=0.02, max_area_ratio=0.4, aspect_range=(0.3, 3.3333333333333335), mode='const', fill_color=(128, 128, 128), fill_std=None)[source]¶
Bases:
object
Randomly selects a rectangle region in an image and erase pixels. :param erase_prob: Probability that image will be randomly erased.
Default: 0.5
- Parameters
min_area_ratio (float) – Minimum erased area / input image area Default: 0.02
max_area_ratio (float) – Maximum erased area / input image area Default: 0.4
aspect_range (sequence | float) – Aspect ratio range of erased area. if float, it will be converted to (aspect_ratio, 1/aspect_ratio) Default: (3/10, 10/3)
mode (str) – Fill method in erased area, can be: - const (default): All pixels are assign with the same value. - rand: each pixel is assigned with a random value in [0, 255]
fill_color (sequence | Number) – Base color filled in erased area. Defaults to (128, 128, 128).
fill_std (sequence | Number, optional) – If set and
mode
is ‘rand’, fill erased area with random color from normal distribution (mean=fill_color, std=fill_std); If not set, fill erased area with random color from uniform distribution (0~255). Defaults to None.
Note
See Random Erasing Data Augmentation This paper provided 4 modes: RE-R, RE-M, RE-0, RE-255, and use RE-M as default. The config of these 4 modes are: - RE-R: RandomErasing(mode=’rand’) - RE-M: RandomErasing(mode=’const’, fill_color=(123.67, 116.3, 103.5)) - RE-0: RandomErasing(mode=’const’, fill_color=0) - RE-255: RandomErasing(mode=’const’, fill_color=255)
- easycv.datasets.classification.pipelines.auto_augment.random_negative(value, random_negative_prob)[source]¶
Randomly negate value based on random_negative_prob.
- easycv.datasets.classification.pipelines.auto_augment.merge_hparams(policy: dict, hparams: dict)[source]¶
Merge hyperparameters into policy config. Only merge partial hyperparameters required of the policy. :param policy: Original policy config dict. :type policy: dict :param hparams: Hyperparameters need to be merged. :type hparams: dict
- Returns
Policy config dict after adding
hparams
.- Return type
dict
- class easycv.datasets.classification.pipelines.auto_augment.MMAutoAugment(policies=[[{'type': 'Posterize', 'bits': 4, 'prob': 0.4}, {'type': 'Rotate', 'angle': 30.0, 'prob': 0.6}], [{'type': 'Solarize', 'thr': 113.77777777777777, 'prob': 0.6}, {'type': 'AutoContrast', 'prob': 0.6}], [{'type': 'Equalize', 'prob': 0.8}, {'type': 'Equalize', 'prob': 0.6}], [{'type': 'Posterize', 'bits': 5, 'prob': 0.6}, {'type': 'Posterize', 'bits': 5, 'prob': 0.6}], [{'type': 'Equalize', 'prob': 0.4}, {'type': 'Solarize', 'thr': 142.22222222222223, 'prob': 0.2}], [{'type': 'Equalize', 'prob': 0.4}, {'type': 'Rotate', 'angle': 26.666666666666668, 'prob': 0.8}], [{'type': 'Solarize', 'thr': 170.66666666666666, 'prob': 0.6}, {'type': 'Equalize', 'prob': 0.6}], [{'type': 'Posterize', 'bits': 6, 'prob': 0.8}, {'type': 'Equalize', 'prob': 1.0}], [{'type': 'Rotate', 'angle': 10.0, 'prob': 0.2}, {'type': 'Solarize', 'thr': 28.444444444444443, 'prob': 0.6}], [{'type': 'Equalize', 'prob': 0.6}, {'type': 'Posterize', 'bits': 5, 'prob': 0.4}], [{'type': 'Rotate', 'angle': 26.666666666666668, 'prob': 0.8}, {'type': 'ColorTransform', 'magnitude': 0.0, 'prob': 0.4}], [{'type': 'Rotate', 'angle': 30.0, 'prob': 0.4}, {'type': 'Equalize', 'prob': 0.6}], [{'type': 'Equalize', 'prob': 0.0}, {'type': 'Equalize', 'prob': 0.8}], [{'type': 'Invert', 'prob': 0.6}, {'type': 'Equalize', 'prob': 1.0}], [{'type': 'ColorTransform', 'magnitude': 0.4, 'prob': 0.6}, {'type': 'Contrast', 'magnitude': 0.8, 'prob': 1.0}], [{'type': 'Rotate', 'angle': 26.666666666666668, 'prob': 0.8}, {'type': 'ColorTransform', 'magnitude': 0.2, 'prob': 1.0}], [{'type': 'ColorTransform', 'magnitude': 0.8, 'prob': 0.8}, {'type': 'Solarize', 'thr': 56.888888888888886, 'prob': 0.8}], [{'type': 'Sharpness', 'magnitude': 0.7, 'prob': 0.4}, {'type': 'Invert', 'prob': 0.6}], [{'type': 'Shear', 'magnitude': 0.16666666666666666, 'prob': 0.6, 'direction': 'horizontal'}, {'type': 'Equalize', 'prob': 1.0}], [{'type': 'ColorTransform', 'magnitude': 0.0, 'prob': 0.4}, {'type': 'Equalize', 'prob': 0.6}], [{'type': 'Equalize', 'prob': 0.4}, {'type': 'Solarize', 'thr': 142.22222222222223, 'prob': 0.2}], [{'type': 'Solarize', 'thr': 113.77777777777777, 'prob': 0.6}, {'type': 'AutoContrast', 'prob': 0.6}], [{'type': 'Invert', 'prob': 0.6}, {'type': 'Equalize', 'prob': 1.0}], [{'type': 'ColorTransform', 'magnitude': 0.4, 'prob': 0.6}, {'type': 'Contrast', 'magnitude': 0.8, 'prob': 1.0}], [{'type': 'Equalize', 'prob': 0.8}, {'type': 'Equalize', 'prob': 0.6}]], hparams={'pad_val': 128})[source]¶
Bases:
object
Auto augmentation. This data augmentation is proposed in AutoAugment: Learning Augmentation Policies from Data. :param policies: The policies of auto augmentation. Each
policy in
policies
is a specific augmentation policy, and is composed by several augmentations (dict). When AutoAugment is called, a random policy inpolicies
will be selected to augment images.- Parameters
hparams (dict) – Configs of hyperparameters. Hyperparameters will be used in policies that require these arguments if these arguments are not set in policy dicts. Defaults to use _HPARAMS_DEFAULT.
- __init__(policies=[[{'type': 'Posterize', 'bits': 4, 'prob': 0.4}, {'type': 'Rotate', 'angle': 30.0, 'prob': 0.6}], [{'type': 'Solarize', 'thr': 113.77777777777777, 'prob': 0.6}, {'type': 'AutoContrast', 'prob': 0.6}], [{'type': 'Equalize', 'prob': 0.8}, {'type': 'Equalize', 'prob': 0.6}], [{'type': 'Posterize', 'bits': 5, 'prob': 0.6}, {'type': 'Posterize', 'bits': 5, 'prob': 0.6}], [{'type': 'Equalize', 'prob': 0.4}, {'type': 'Solarize', 'thr': 142.22222222222223, 'prob': 0.2}], [{'type': 'Equalize', 'prob': 0.4}, {'type': 'Rotate', 'angle': 26.666666666666668, 'prob': 0.8}], [{'type': 'Solarize', 'thr': 170.66666666666666, 'prob': 0.6}, {'type': 'Equalize', 'prob': 0.6}], [{'type': 'Posterize', 'bits': 6, 'prob': 0.8}, {'type': 'Equalize', 'prob': 1.0}], [{'type': 'Rotate', 'angle': 10.0, 'prob': 0.2}, {'type': 'Solarize', 'thr': 28.444444444444443, 'prob': 0.6}], [{'type': 'Equalize', 'prob': 0.6}, {'type': 'Posterize', 'bits': 5, 'prob': 0.4}], [{'type': 'Rotate', 'angle': 26.666666666666668, 'prob': 0.8}, {'type': 'ColorTransform', 'magnitude': 0.0, 'prob': 0.4}], [{'type': 'Rotate', 'angle': 30.0, 'prob': 0.4}, {'type': 'Equalize', 'prob': 0.6}], [{'type': 'Equalize', 'prob': 0.0}, {'type': 'Equalize', 'prob': 0.8}], [{'type': 'Invert', 'prob': 0.6}, {'type': 'Equalize', 'prob': 1.0}], [{'type': 'ColorTransform', 'magnitude': 0.4, 'prob': 0.6}, {'type': 'Contrast', 'magnitude': 0.8, 'prob': 1.0}], [{'type': 'Rotate', 'angle': 26.666666666666668, 'prob': 0.8}, {'type': 'ColorTransform', 'magnitude': 0.2, 'prob': 1.0}], [{'type': 'ColorTransform', 'magnitude': 0.8, 'prob': 0.8}, {'type': 'Solarize', 'thr': 56.888888888888886, 'prob': 0.8}], [{'type': 'Sharpness', 'magnitude': 0.7, 'prob': 0.4}, {'type': 'Invert', 'prob': 0.6}], [{'type': 'Shear', 'magnitude': 0.16666666666666666, 'prob': 0.6, 'direction': 'horizontal'}, {'type': 'Equalize', 'prob': 1.0}], [{'type': 'ColorTransform', 'magnitude': 0.0, 'prob': 0.4}, {'type': 'Equalize', 'prob': 0.6}], [{'type': 'Equalize', 'prob': 0.4}, {'type': 'Solarize', 'thr': 142.22222222222223, 'prob': 0.2}], [{'type': 'Solarize', 'thr': 113.77777777777777, 'prob': 0.6}, {'type': 'AutoContrast', 'prob': 0.6}], [{'type': 'Invert', 'prob': 0.6}, {'type': 'Equalize', 'prob': 1.0}], [{'type': 'ColorTransform', 'magnitude': 0.4, 'prob': 0.6}, {'type': 'Contrast', 'magnitude': 0.8, 'prob': 1.0}], [{'type': 'Equalize', 'prob': 0.8}, {'type': 'Equalize', 'prob': 0.6}]], hparams={'pad_val': 128})[source]¶
Initialize self. See help(type(self)) for accurate signature.
- class easycv.datasets.classification.pipelines.auto_augment.MMRandAugment(num_policies, magnitude_level, magnitude_std=0.0, total_level=30, policies=[{'type': 'AutoContrast'}, {'type': 'Equalize'}, {'type': 'Invert'}, {'type': 'Rotate', 'magnitude_key': 'angle', 'magnitude_range': (0, 30)}, {'type': 'Posterize', 'magnitude_key': 'bits', 'magnitude_range': (4, 0)}, {'type': 'Solarize', 'magnitude_key': 'thr', 'magnitude_range': (256, 0)}, {'type': 'SolarizeAdd', 'magnitude_key': 'magnitude', 'magnitude_range': (0, 110)}, {'type': 'ColorTransform', 'magnitude_key': 'magnitude', 'magnitude_range': (0, 0.9)}, {'type': 'Contrast', 'magnitude_key': 'magnitude', 'magnitude_range': (0, 0.9)}, {'type': 'Brightness', 'magnitude_key': 'magnitude', 'magnitude_range': (0, 0.9)}, {'type': 'Sharpness', 'magnitude_key': 'magnitude', 'magnitude_range': (0, 0.9)}, {'type': 'Shear', 'magnitude_key': 'magnitude', 'magnitude_range': (0, 0.3), 'direction': 'horizontal'}, {'type': 'Shear', 'magnitude_key': 'magnitude', 'magnitude_range': (0, 0.3), 'direction': 'vertical'}, {'type': 'Translate', 'magnitude_key': 'magnitude', 'magnitude_range': (0, 0.45), 'direction': 'horizontal'}, {'type': 'Translate', 'magnitude_key': 'magnitude', 'magnitude_range': (0, 0.45), 'direction': 'vertical'}], hparams={'pad_val': 128})[source]¶
Bases:
object
Random augmentation. This data augmentation is proposed in RandAugment: Practical automated data augmentation with a reduced search space. :param policies: The policies of random augmentation. Each
policy in
policies
is one specific augmentation policy (dict). The policy shall at least have key type, indicating the type of augmentation. For those which have magnitude, (given to the fact they are named differently in different augmentation, ) magnitude_key and magnitude_range shall be the magnitude argument (str) and the range of magnitude (tuple in the format of (val1, val2)), respectively. Note that val1 is not necessarily less than val2.- Parameters
num_policies (int) – Number of policies to select from policies each time.
magnitude_level (int | float) – Magnitude level for all the augmentation selected.
total_level (int | float) – Total level for the magnitude. Defaults to 30.
magnitude_std (Number | str) –
Deviation of magnitude noise applied. - If positive number, magnitude is sampled from normal distribution
(mean=magnitude, std=magnitude_std).
If 0 or negative number, magnitude remains unchanged.
If str “inf”, magnitude is sampled from uniform distribution (range=[min, magnitude]).
hparams (dict) – Configs of hyperparameters. Hyperparameters will be used in policies that require these arguments if these arguments are not set in policy dicts. Defaults to use _HPARAMS_DEFAULT.
Note
magnitude_std will introduce some randomness to policy, modified by https://github.com/rwightman/pytorch-image-models. When magnitude_std=0, we calculate the magnitude as follows: .. math:
\text{magnitude} = \frac{\text{magnitude\_level}} {\text{total\_level}} \times (\text{val2} - \text{val1}) + \text{val1}
- __init__(num_policies, magnitude_level, magnitude_std=0.0, total_level=30, policies=[{'type': 'AutoContrast'}, {'type': 'Equalize'}, {'type': 'Invert'}, {'type': 'Rotate', 'magnitude_key': 'angle', 'magnitude_range': (0, 30)}, {'type': 'Posterize', 'magnitude_key': 'bits', 'magnitude_range': (4, 0)}, {'type': 'Solarize', 'magnitude_key': 'thr', 'magnitude_range': (256, 0)}, {'type': 'SolarizeAdd', 'magnitude_key': 'magnitude', 'magnitude_range': (0, 110)}, {'type': 'ColorTransform', 'magnitude_key': 'magnitude', 'magnitude_range': (0, 0.9)}, {'type': 'Contrast', 'magnitude_key': 'magnitude', 'magnitude_range': (0, 0.9)}, {'type': 'Brightness', 'magnitude_key': 'magnitude', 'magnitude_range': (0, 0.9)}, {'type': 'Sharpness', 'magnitude_key': 'magnitude', 'magnitude_range': (0, 0.9)}, {'type': 'Shear', 'magnitude_key': 'magnitude', 'magnitude_range': (0, 0.3), 'direction': 'horizontal'}, {'type': 'Shear', 'magnitude_key': 'magnitude', 'magnitude_range': (0, 0.3), 'direction': 'vertical'}, {'type': 'Translate', 'magnitude_key': 'magnitude', 'magnitude_range': (0, 0.45), 'direction': 'horizontal'}, {'type': 'Translate', 'magnitude_key': 'magnitude', 'magnitude_range': (0, 0.45), 'direction': 'vertical'}], hparams={'pad_val': 128})[source]¶
Initialize self. See help(type(self)) for accurate signature.
- class easycv.datasets.classification.pipelines.auto_augment.Shear(magnitude, pad_val=128, prob=0.5, direction='horizontal', random_negative_prob=0.5, interpolation='bicubic')[source]¶
Bases:
object
Shear images. :param magnitude: The magnitude used for shear. :type magnitude: int | float :param pad_val: Pixel pad_val value for constant fill.
If a sequence of length 3, it is used to pad_val R, G, B channels respectively. Defaults to 128.
- Parameters
prob (float) – The probability for performing Shear therefore should be in range [0, 1]. Defaults to 0.5.
direction (str) – The shearing direction. Options are ‘horizontal’ and ‘vertical’. Defaults to ‘horizontal’.
random_negative_prob (float) – The probability that turns the magnitude negative, which should be in range [0,1]. Defaults to 0.5.
interpolation (str) – Interpolation method. Options are ‘nearest’, ‘bilinear’, ‘bicubic’, ‘area’, ‘lanczos’. Defaults to ‘bicubic’.
- class easycv.datasets.classification.pipelines.auto_augment.Translate(magnitude, pad_val=128, prob=0.5, direction='horizontal', random_negative_prob=0.5, interpolation='nearest')[source]¶
Bases:
object
Translate images. :param magnitude: The magnitude used for translate. Note that
the offset is calculated by magnitude * size in the corresponding direction. With a magnitude of 1, the whole image will be moved out of the range.
- Parameters
pad_val (int, Sequence[int]) – Pixel pad_val value for constant fill. If a sequence of length 3, it is used to pad_val R, G, B channels respectively. Defaults to 128.
prob (float) – The probability for performing translate therefore should be in range [0, 1]. Defaults to 0.5.
direction (str) – The translating direction. Options are ‘horizontal’ and ‘vertical’. Defaults to ‘horizontal’.
random_negative_prob (float) – The probability that turns the magnitude negative, which should be in range [0,1]. Defaults to 0.5.
interpolation (str) – Interpolation method. Options are ‘nearest’, ‘bilinear’, ‘bicubic’, ‘area’, ‘lanczos’. Defaults to ‘nearest’.
- class easycv.datasets.classification.pipelines.auto_augment.Rotate(angle, center=None, scale=1.0, pad_val=128, prob=0.5, random_negative_prob=0.5, interpolation='nearest')[source]¶
Bases:
object
Rotate images. :param angle: The angle used for rotate. Positive values stand for
clockwise rotation.
- Parameters
center (tuple[float], optional) – Center point (w, h) of the rotation in the source image. If None, the center of the image will be used. Defaults to None.
scale (float) – Isotropic scale factor. Defaults to 1.0.
pad_val (int, Sequence[int]) – Pixel pad_val value for constant fill. If a sequence of length 3, it is used to pad_val R, G, B channels respectively. Defaults to 128.
prob (float) – The probability for performing Rotate therefore should be in range [0, 1]. Defaults to 0.5.
random_negative_prob (float) – The probability that turns the angle negative, which should be in range [0,1]. Defaults to 0.5.
interpolation (str) – Interpolation method. Options are ‘nearest’, ‘bilinear’, ‘bicubic’, ‘area’, ‘lanczos’. Defaults to ‘nearest’.
- class easycv.datasets.classification.pipelines.auto_augment.AutoContrast(prob=0.5)[source]¶
Bases:
object
Auto adjust image contrast. :param prob: The probability for performing invert therefore should
be in range [0, 1]. Defaults to 0.5.
- class easycv.datasets.classification.pipelines.auto_augment.Invert(prob=0.5)[source]¶
Bases:
object
Invert images. :param prob: The probability for performing invert therefore should
be in range [0, 1]. Defaults to 0.5.
- class easycv.datasets.classification.pipelines.auto_augment.Equalize(prob=0.5)[source]¶
Bases:
object
Equalize the image histogram. :param prob: The probability for performing invert therefore should
be in range [0, 1]. Defaults to 0.5.
- class easycv.datasets.classification.pipelines.auto_augment.Solarize(thr, prob=0.5)[source]¶
Bases:
object
Solarize images (invert all pixel values above a threshold). :param thr: The threshold above which the pixels value will be
inverted.
- Parameters
prob (float) – The probability for solarizing therefore should be in range [0, 1]. Defaults to 0.5.
- class easycv.datasets.classification.pipelines.auto_augment.SolarizeAdd(magnitude, thr=128, prob=0.5)[source]¶
Bases:
object
SolarizeAdd images (add a certain value to pixels below a threshold). :param magnitude: The value to be added to pixels below the thr. :type magnitude: int | float :param thr: The threshold below which the pixels value will be
adjusted.
- Parameters
prob (float) – The probability for solarizing therefore should be in range [0, 1]. Defaults to 0.5.
- class easycv.datasets.classification.pipelines.auto_augment.Posterize(bits, prob=0.5)[source]¶
Bases:
object
Posterize images (reduce the number of bits for each color channel). :param bits: Number of bits for each pixel in the output img,
which should be less or equal to 8.
- Parameters
prob (float) – The probability for posterizing therefore should be in range [0, 1]. Defaults to 0.5.
- class easycv.datasets.classification.pipelines.auto_augment.Contrast(magnitude, prob=0.5, random_negative_prob=0.5)[source]¶
Bases:
object
Adjust images contrast. :param magnitude: The magnitude used for adjusting contrast. A
positive magnitude would enhance the contrast and a negative magnitude would make the image grayer. A magnitude=0 gives the origin img.
- Parameters
prob (float) – The probability for performing contrast adjusting therefore should be in range [0, 1]. Defaults to 0.5.
random_negative_prob (float) – The probability that turns the magnitude negative, which should be in range [0,1]. Defaults to 0.5.
- class easycv.datasets.classification.pipelines.auto_augment.ColorTransform(magnitude, prob=0.5, random_negative_prob=0.5)[source]¶
Bases:
object
Adjust images color balance. :param magnitude: The magnitude used for color transform. A
positive magnitude would enhance the color and a negative magnitude would make the image grayer. A magnitude=0 gives the origin img.
- Parameters
prob (float) – The probability for performing ColorTransform therefore should be in range [0, 1]. Defaults to 0.5.
random_negative_prob (float) – The probability that turns the magnitude negative, which should be in range [0,1]. Defaults to 0.5.
- class easycv.datasets.classification.pipelines.auto_augment.Brightness(magnitude, prob=0.5, random_negative_prob=0.5)[source]¶
Bases:
object
Adjust images brightness. :param magnitude: The magnitude used for adjusting brightness. A
positive magnitude would enhance the brightness and a negative magnitude would make the image darker. A magnitude=0 gives the origin img.
- Parameters
prob (float) – The probability for performing contrast adjusting therefore should be in range [0, 1]. Defaults to 0.5.
random_negative_prob (float) – The probability that turns the magnitude negative, which should be in range [0,1]. Defaults to 0.5.
- class easycv.datasets.classification.pipelines.auto_augment.Sharpness(magnitude, prob=0.5, random_negative_prob=0.5)[source]¶
Bases:
object
Adjust images sharpness. :param magnitude: The magnitude used for adjusting sharpness. A
positive magnitude would enhance the sharpness and a negative magnitude would make the image bulr. A magnitude=0 gives the origin img.
- Parameters
prob (float) – The probability for performing contrast adjusting therefore should be in range [0, 1]. Defaults to 0.5.
random_negative_prob (float) – The probability that turns the magnitude negative, which should be in range [0,1]. Defaults to 0.5.
- class easycv.datasets.classification.pipelines.auto_augment.Cutout(shape, pad_val=128, prob=0.5)[source]¶
Bases:
object
Cutout images. :param shape: Expected cutout shape (h, w).
If given as a single value, the value will be used for both h and w.
- Parameters
pad_val (int, Sequence[int]) – Pixel pad_val value for constant fill. If it is a sequence, it must have the same length with the image channels. Defaults to 128.
prob (float) – The probability for performing cutout therefore should be in range [0, 1]. Defaults to 0.5.
- class easycv.datasets.classification.pipelines.transform.MMRandomErasing(erase_prob=0.5, min_area_ratio=0.02, max_area_ratio=0.4, aspect_range=(0.3, 3.3333333333333335), mode='const', fill_color=(128, 128, 128), fill_std=None)[source]¶
Bases:
object
Randomly selects a rectangle region in an image and erase pixels. :param erase_prob: Probability that image will be randomly erased.
Default: 0.5
- Parameters
min_area_ratio (float) – Minimum erased area / input image area Default: 0.02
max_area_ratio (float) – Maximum erased area / input image area Default: 0.4
aspect_range (sequence | float) – Aspect ratio range of erased area. if float, it will be converted to (aspect_ratio, 1/aspect_ratio) Default: (3/10, 10/3)
mode (str) – Fill method in erased area, can be: - const (default): All pixels are assign with the same value. - rand: each pixel is assigned with a random value in [0, 255]
fill_color (sequence | Number) – Base color filled in erased area. Defaults to (128, 128, 128).
fill_std (sequence | Number, optional) – If set and
mode
is ‘rand’, fill erased area with random color from normal distribution (mean=fill_color, std=fill_std); If not set, fill erased area with random color from uniform distribution (0~255). Defaults to None.
Note
See Random Erasing Data Augmentation This paper provided 4 modes: RE-R, RE-M, RE-0, RE-255, and use RE-M as default. The config of these 4 modes are: - RE-R: RandomErasing(mode=’rand’) - RE-M: RandomErasing(mode=’const’, fill_color=(123.67, 116.3, 103.5)) - RE-0: RandomErasing(mode=’const’, fill_color=0) - RE-255: RandomErasing(mode=’const’, fill_color=255)
Submodules¶
easycv.datasets.classification.odps module¶
- class easycv.datasets.classification.odps.ClsOdpsDataset(data_source, pipeline, image_key='url_image', label_key='label', **kwargs)[source]¶
Bases:
Generic
[torch.utils.data.dataset.T_co
]Dataset for rotation prediction
easycv.datasets.classification.raw module¶
- class easycv.datasets.classification.raw.ClsDataset(data_source, pipeline)[source]¶
Bases:
Generic
[torch.utils.data.dataset.T_co
]Dataset for classification
- Parameters
data_source – data source to parse input data
pipeline – transforms list
- __init__(data_source, pipeline)[source]¶
Initialize self. See help(type(self)) for accurate signature.
- evaluate(results, evaluators, logger=None, topk=(1, 5))[source]¶
evaluate classification task
- Parameters
results – a dict of list of tensor, including prediction and groundtruth info, where prediction tensor is NxC,and the same with groundtruth labels.
evaluators – a list of evaluator
- Returns
a dict of float, different metric values
- Return type
eval_result
- visualize(results, vis_num=10, **kwargs)[source]¶
Visulaize the model output on validation data. :param results: A dictionary containing
class: List of length number of test images. img_metas: List of length number of test images,
dict of image meta info, containing filename, img_shape, origin_img_shape and so on.
- Parameters
vis_num – number of images visualized
- Returns: A dictionary containing
images: Visulaized images, list of np.ndarray. img_metas: List of length number of test images,
dict of image meta info, containing filename, img_shape, origin_img_shape and so on.
easycv.datasets.detection package¶
- class easycv.datasets.detection.DetDataset(data_source, pipeline, profiling=False, classes=None)[source]¶
Bases:
Generic
[torch.utils.data.dataset.T_co
]Dataset for Detection
- __init__(data_source, pipeline, profiling=False, classes=None)[source]¶
- Parameters
data_source – Data_source config dict
pipeline – Pipeline config list
profiling – If set True, will print pipeline time
classes – A list of class names, used in evaluation for result and groundtruth visualization
- evaluate(results, evaluators=None, logger=None)[source]¶
Evaluates the detection boxes. :param results: A dictionary containing
- detection_boxes: List of length number of test images.
Float32 numpy array of shape [num_boxes, 4] and format [ymin, xmin, ymax, xmax] in absolute image coordinates.
- detection_scores: List of length number of test images,
detection scores for the boxes, float32 numpy array of shape [num_boxes].
- detection_classes: List of length number of test images,
integer numpy array of shape [num_boxes] containing 1-indexed detection classes for the boxes.
- img_metas: List of length number of test images,
dict of image meta info, containing filename, img_shape, origin_img_shape, scale_factor and so on.
- Parameters
evaluators – evaluators to calculate metric with results and groundtruth_dict
- visualize(results, vis_num=10, score_thr=0.3, **kwargs)[source]¶
Visulaize the model output on validation data. :param results: A dictionary containing
- detection_boxes: List of length number of test images.
Float32 numpy array of shape [num_boxes, 4] and format [ymin, xmin, ymax, xmax] in absolute image coordinates.
- detection_scores: List of length number of test images,
detection scores for the boxes, float32 numpy array of shape [num_boxes].
- detection_classes: List of length number of test images,
integer numpy array of shape [num_boxes] containing 1-indexed detection classes for the boxes.
- img_metas: List of length number of test images,
dict of image meta info, containing filename, img_shape, origin_img_shape, scale_factor and so on.
- Parameters
vis_num – number of images visualized
score_thr – The threshold to filter box, boxes with scores greater than score_thr will be kept.
- Returns: A dictionary containing
images: Visulaized images. img_metas: List of length number of test images,
dict of image meta info, containing filename, img_shape, origin_img_shape, scale_factor and so on.
- class easycv.datasets.detection.DetImagesMixDataset(data_source, pipeline, dynamic_scale=None, skip_type_keys=None, profiling=False, classes=None, yolo_format=True, label_padding=True)[source]¶
Bases:
Generic
[torch.utils.data.dataset.T_co
]A wrapper of multiple images mixed dataset.
Suitable for training on multiple images mixed data augmentation like mosaic and mixup. For the augmentation pipeline of mixed image data, the get_indexes method needs to be provided to obtain the image indexes, and you can set skip_flags to change the pipeline running process. At the same time, we provide the dynamic_scale parameter to dynamically change the output image size.
output boxes format: cx, cy, w, h
- Parameters
data_source (
DetSourceCoco
) – The dataset to be mixed.pipeline (Sequence[dict]) – Sequence of transform object or config dict to be composed.
dynamic_scale (tuple[int], optional) – The image scale can be changed dynamically. Default to None.
skip_type_keys (list[str], optional) – Sequence of type string to be skip pipeline. Default to None.
label_padding – out labeling padding [N, 120, 5]
- __init__(data_source, pipeline, dynamic_scale=None, skip_type_keys=None, profiling=False, classes=None, yolo_format=True, label_padding=True)[source]¶
Args: data_source: Data_source config dict pipeline: Pipeline config list profiling: If set True, will print pipeline time classes: A list of class names, used in evaluation for result and groundtruth visualization
- update_skip_type_keys(skip_type_keys)[source]¶
Update skip_type_keys. It is called by an external hook.
- Parameters
skip_type_keys (list[str], optional) – Sequence of type string to be skip pipeline.
- update_dynamic_scale(dynamic_scale)[source]¶
Update dynamic_scale. It is called by an external hook.
- Parameters
dynamic_scale (tuple[int]) – The image scale can be changed dynamically.
- results2json(results, outfile_prefix)[source]¶
Dump the detection results to a COCO style json file.
There are 3 types of results: proposals, bbox predictions, mask predictions, and they have different data types. This method will automatically recognize the type, and dump them to json files.
- Parameters
results (list[list | tuple | ndarray]) – Testing results of the dataset.
outfile_prefix (str) – The filename prefix of the json files. If the prefix is “somepath/xxx”, the json files will be named “somepath/xxx.bbox.json”, “somepath/xxx.segm.json”, “somepath/xxx.proposal.json”.
- Returns
str]: Possible keys are “bbox”, “segm”, “proposal”, and values are corresponding filenames.
- Return type
dict[str
- format_results(results, jsonfile_prefix=None, **kwargs)[source]¶
Format the results to json (standard format for COCO evaluation).
- Parameters
results (list[tuple | numpy.ndarray]) – Testing results of the dataset.
jsonfile_prefix (str | None) – The prefix of json files. It includes the file path and the prefix of filename, e.g., “a/b/prefix”. If not specified, a temp file will be created. Default: None.
- Returns
(result_files, tmp_dir), result_files is a dict containing the json filepaths, tmp_dir is the temporal directory created for saving json files when jsonfile_prefix is not specified.
- Return type
tuple
Subpackages¶
easycv.datasets.detection.data_sources package¶
- class easycv.datasets.detection.data_sources.DetSourceCoco(ann_file, img_prefix, pipeline, filter_empty_gt=False, classes=None, iscrowd=False)[source]¶
Bases:
object
coco data source
- __init__(ann_file, img_prefix, pipeline, filter_empty_gt=False, classes=None, iscrowd=False)[source]¶
- Parameters
ann_file – Path of annotation file.
img_prefix – coco path prefix
filter_empty_gt – bool, if filter empty gt
iscrowd – when traing setted as False, when val setted as Tre
- load_annotations(ann_file)[source]¶
Load annotation from COCO style annotation file.
- Parameters
ann_file (str) – Path of annotation file.
- Returns
Annotation info from COCO api.
- Return type
list[dict]
- get_ann_info(idx)[source]¶
Get COCO annotation by index.
- Parameters
idx (int) – Index of data.
- Returns
Annotation info of specified index.
- Return type
dict
- get_cat_ids(idx)[source]¶
Get COCO category ids by index.
- Parameters
idx (int) – Index of data.
- Returns
All categories in the image of specified index.
- Return type
list[int]
- xyxy2xywh(bbox)[source]¶
Convert
xyxy
style bounding boxes toxywh
style for COCO evaluation.- Parameters
bbox (numpy.ndarray) – The bounding boxes, shape (4, ), in
xyxy
order.- Returns
The converted bounding boxes, in
xywh
order.- Return type
list[float]
- class easycv.datasets.detection.data_sources.DetSourcePAI(path, classes=[], cache_at_init=False, cache_on_the_fly=False, **kwargs)[source]¶
Bases:
easycv.datasets.detection.data_sources.voc.DetSourceVOC
data format please refer to: https://help.aliyun.com/document_detail/311173.html
- __init__(path, classes=[], cache_at_init=False, cache_on_the_fly=False, **kwargs)[source]¶
- Parameters
path – Path of manifest path with pai label format
classes – classes list
cache_at_init – if set True, will cache in memory in __init__ for faster training
cache_on_the_fly – if set True, will cache in memroy during training
- class easycv.datasets.detection.data_sources.DetSourceRaw(img_root_path, label_root_path, cache_at_init=False, cache_on_the_fly=False, delimeter=' ', **kwargs)[source]¶
Bases:
easycv.datasets.detection.data_sources.voc.DetSourceVOC
data dir is as follows: ``` |- data_dir
` Label txt file is as follows: The first column is the label id, and columns 2 to 5 are coordinates relative to the image width and height [x_center, y_center, bbox_w, bbox_h]. `
15 0.519398 0.544087 0.476359 0.572061 2 0.501859 0.820726 0.996281 0.332178 … ``` .. rubric:: Example- data_source = DetSourceRaw(
img_root_path=’/your/data_dir/images’, label_root_path=’/your/data_dir/labels’,
)
- __init__(img_root_path, label_root_path, cache_at_init=False, cache_on_the_fly=False, delimeter=' ', **kwargs)[source]¶
- Parameters
img_root_path – images dir path
label_root_path – labels dir path
cache_at_init – if set True, will cache in memory in __init__ for faster training
cache_on_the_fly – if set True, will cache in memroy during training
- class easycv.datasets.detection.data_sources.DetSourceVOC(path, classes=[], img_root_path=None, label_root_path=None, cache_at_init=False, cache_on_the_fly=False, img_suffix='.jpg', label_suffix='.xml', **kwargs)[source]¶
Bases:
object
data dir is as follows: ``` |- voc_data
``` Example1:
- data_source = DetSourceVOC(
path=’/your/voc_data/ImageSets/Main/train.txt’, classes=${VOC_CLASSES},
)
- Example1:
- data_source = DetSourceVOC(
path=’/your/voc_data/train.txt’, classes=${VOC_CLASSES}, img_root_path=’/your/voc_data/images’, img_root_path=’/your/voc_data/annotations’
)
- __init__(path, classes=[], img_root_path=None, label_root_path=None, cache_at_init=False, cache_on_the_fly=False, img_suffix='.jpg', label_suffix='.xml', **kwargs)[source]¶
- Parameters
path – path of img id list file in ImageSets/Main/
classes – classes list
img_root_path – image dir path, if None, default to detect the image dir by the relative path of the path according to the VOC data format.
label_root_path – label dir path, if None, default to detect the label dir by the relative path of the path according to the VOC data format.
cache_at_init – if set True, will cache in memory in __init__ for faster training
cache_on_the_fly – if set True, will cache in memroy during training
img_suffix – suffix of image file
label_suffix – suffix of label file
- class easycv.datasets.detection.data_sources.coco.DetSourceCoco(ann_file, img_prefix, pipeline, filter_empty_gt=False, classes=None, iscrowd=False)[source]¶
Bases:
object
coco data source
- __init__(ann_file, img_prefix, pipeline, filter_empty_gt=False, classes=None, iscrowd=False)[source]¶
- Parameters
ann_file – Path of annotation file.
img_prefix – coco path prefix
filter_empty_gt – bool, if filter empty gt
iscrowd – when traing setted as False, when val setted as Tre
- load_annotations(ann_file)[source]¶
Load annotation from COCO style annotation file.
- Parameters
ann_file (str) – Path of annotation file.
- Returns
Annotation info from COCO api.
- Return type
list[dict]
- get_ann_info(idx)[source]¶
Get COCO annotation by index.
- Parameters
idx (int) – Index of data.
- Returns
Annotation info of specified index.
- Return type
dict
- get_cat_ids(idx)[source]¶
Get COCO category ids by index.
- Parameters
idx (int) – Index of data.
- Returns
All categories in the image of specified index.
- Return type
list[int]
- xyxy2xywh(bbox)[source]¶
Convert
xyxy
style bounding boxes toxywh
style for COCO evaluation.- Parameters
bbox (numpy.ndarray) – The bounding boxes, shape (4, ), in
xyxy
order.- Returns
The converted bounding boxes, in
xywh
order.- Return type
list[float]
- easycv.datasets.detection.data_sources.pai_format.get_prior_task_id(keys)[source]¶
“The task id ends with check is the highest priority.
- easycv.datasets.detection.data_sources.pai_format.is_itag_v2(row)[source]¶
The keyword of the data source is picUrl in v1, but is source in v2
- class easycv.datasets.detection.data_sources.pai_format.DetSourcePAI(path, classes=[], cache_at_init=False, cache_on_the_fly=False, **kwargs)[source]¶
Bases:
easycv.datasets.detection.data_sources.voc.DetSourceVOC
data format please refer to: https://help.aliyun.com/document_detail/311173.html
- __init__(path, classes=[], cache_at_init=False, cache_on_the_fly=False, **kwargs)[source]¶
- Parameters
path – Path of manifest path with pai label format
classes – classes list
cache_at_init – if set True, will cache in memory in __init__ for faster training
cache_on_the_fly – if set True, will cache in memroy during training
- class easycv.datasets.detection.data_sources.raw.DetSourceRaw(img_root_path, label_root_path, cache_at_init=False, cache_on_the_fly=False, delimeter=' ', **kwargs)[source]¶
Bases:
easycv.datasets.detection.data_sources.voc.DetSourceVOC
data dir is as follows: ``` |- data_dir
` Label txt file is as follows: The first column is the label id, and columns 2 to 5 are coordinates relative to the image width and height [x_center, y_center, bbox_w, bbox_h]. `
15 0.519398 0.544087 0.476359 0.572061 2 0.501859 0.820726 0.996281 0.332178 … ``` .. rubric:: Example- data_source = DetSourceRaw(
img_root_path=’/your/data_dir/images’, label_root_path=’/your/data_dir/labels’,
)
- __init__(img_root_path, label_root_path, cache_at_init=False, cache_on_the_fly=False, delimeter=' ', **kwargs)[source]¶
- Parameters
img_root_path – images dir path
label_root_path – labels dir path
cache_at_init – if set True, will cache in memory in __init__ for faster training
cache_on_the_fly – if set True, will cache in memroy during training
- class easycv.datasets.detection.data_sources.voc.DetSourceVOC(path, classes=[], img_root_path=None, label_root_path=None, cache_at_init=False, cache_on_the_fly=False, img_suffix='.jpg', label_suffix='.xml', **kwargs)[source]¶
Bases:
object
data dir is as follows: ``` |- voc_data
``` Example1:
- data_source = DetSourceVOC(
path=’/your/voc_data/ImageSets/Main/train.txt’, classes=${VOC_CLASSES},
)
- Example1:
- data_source = DetSourceVOC(
path=’/your/voc_data/train.txt’, classes=${VOC_CLASSES}, img_root_path=’/your/voc_data/images’, img_root_path=’/your/voc_data/annotations’
)
- __init__(path, classes=[], img_root_path=None, label_root_path=None, cache_at_init=False, cache_on_the_fly=False, img_suffix='.jpg', label_suffix='.xml', **kwargs)[source]¶
- Parameters
path – path of img id list file in ImageSets/Main/
classes – classes list
img_root_path – image dir path, if None, default to detect the image dir by the relative path of the path according to the VOC data format.
label_root_path – label dir path, if None, default to detect the label dir by the relative path of the path according to the VOC data format.
cache_at_init – if set True, will cache in memory in __init__ for faster training
cache_on_the_fly – if set True, will cache in memroy during training
img_suffix – suffix of image file
label_suffix – suffix of label file
easycv.datasets.detection.pipelines package¶
- class easycv.datasets.detection.pipelines.MMToTensor[source]¶
Bases:
object
Transform image to Tensor.
Required key: ‘img’. Modifies key: ‘img’.
- Parameters
results (dict) – contain all information about training.
- class easycv.datasets.detection.pipelines.NormalizeTensor(mean, std)[source]¶
Bases:
object
Normalize the Tensor image (CxHxW), with mean and std.
Required key: ‘img’. Modifies key: ‘img’.
- Parameters
mean (list[float]) – Mean values of 3 channels.
std (list[float]) – Std values of 3 channels.
- class easycv.datasets.detection.pipelines.MMMosaic(img_scale=(640, 640), center_ratio_range=(0.5, 1.5), pad_val=114)[source]¶
Bases:
object
Mosaic augmentation.
Given 4 images, mosaic transform combines them into one output image. The output image is composed of the parts from each sub- image.
mosaic transform center_x +------------------------------+ | pad | pad | | +-----------+ | | | | | | | image1 |--------+ | | | | | | | | | image2 | | center_y |----+-------------+-----------| | | cropped | | |pad | image3 | image4 | | | | | +----|-------------+-----------+ | | +-------------+ The mosaic transform steps are as follows: 1. Choose the mosaic center as the intersections of 4 images 2. Get the left top image according to the index, and randomly sample another 3 images from the custom dataset. 3. Sub image will be cropped if image is larger than mosaic patch
- Parameters
img_scale (Sequence[int]) – Image size after mosaic pipeline of single image. Default to (640, 640).
center_ratio_range (Sequence[float]) – Center ratio range of mosaic output. Default to (0.5, 1.5).
pad_val (int) – Pad value. Default to 114.
- class easycv.datasets.detection.pipelines.MMMixUp(img_scale=(640, 640), ratio_range=(0.5, 1.5), flip_ratio=0.5, pad_val=114, max_iters=15, min_bbox_size=5, min_area_ratio=0.2, max_aspect_ratio=20)[source]¶
Bases:
object
MixUp data augmentation.
mixup transform +------------------------------+ | mixup image | | | +--------|--------+ | | | | | | |---------------+ | | | | | | | | image | | | | | | | | | | | |-----------------+ | | pad | +------------------------------+ The mixup transform steps are as follows:: 1. Another random image is picked by dataset and embedded in the top left patch(after padding and resizing) 2. The target of mixup transform is the weighted average of mixup image and origin image.
- Parameters
img_scale (Sequence[int]) – Image output size after mixup pipeline. Default: (640, 640).
ratio_range (Sequence[float]) – Scale ratio of mixup image. Default: (0.5, 1.5).
flip_ratio (float) – Horizontal flip ratio of mixup image. Default: 0.5.
pad_val (int) – Pad value. Default: 114.
max_iters (int) – The maximum number of iterations. If the number of iterations is greater than max_iters, but gt_bbox is still empty, then the iteration is terminated. Default: 15.
min_bbox_size (float) – Width and height threshold to filter bboxes. If the height or width of a box is smaller than this value, it will be removed. Default: 5.
min_area_ratio (float) – Threshold of area ratio between original bboxes and wrapped bboxes. If smaller than this value, the box will be removed. Default: 0.2.
max_aspect_ratio (float) – Aspect ratio of width and height threshold to filter bboxes. If max(h/w, w/h) larger than this value, the box will be removed. Default: 20.
- class easycv.datasets.detection.pipelines.MMRandomAffine(max_rotate_degree=10.0, max_translate_ratio=0.1, scaling_ratio_range=(0.5, 1.5), max_shear_degree=2.0, border=(0, 0), border_val=(114, 114, 114), min_bbox_size=2, min_area_ratio=0.2, max_aspect_ratio=20)[source]¶
Bases:
object
Random affine transform data augmentation. for yolox
This operation randomly generates affine transform matrix which including rotation, translation, shear and scaling transforms.
- Parameters
max_rotate_degree (float) – Maximum degrees of rotation transform. Default: 10.
max_translate_ratio (float) – Maximum ratio of translation. Default: 0.1.
scaling_ratio_range (tuple[float]) – Min and max ratio of scaling transform. Default: (0.5, 1.5).
max_shear_degree (float) – Maximum degrees of shear transform. Default: 2.
border (tuple[int]) – Distance from height and width sides of input image to adjust output shape. Only used in mosaic dataset. Default: (0, 0).
border_val (tuple[int]) – Border padding values of 3 channels. Default: (114, 114, 114).
min_bbox_size (float) – Width and height threshold to filter bboxes. If the height or width of a box is smaller than this value, it will be removed. Default: 2.
min_area_ratio (float) – Threshold of area ratio between original bboxes and wrapped bboxes. If smaller than this value, the box will be removed. Default: 0.2.
max_aspect_ratio (float) – Aspect ratio of width and height threshold to filter bboxes. If max(h/w, w/h) larger than this value, the box will be removed.
- class easycv.datasets.detection.pipelines.MMPhotoMetricDistortion(brightness_delta=32, contrast_range=(0.5, 1.5), saturation_range=(0.5, 1.5), hue_delta=18)[source]¶
Bases:
object
Apply photometric distortion to image sequentially, every transformation is applied with a probability of 0.5. The position of random contrast is in second or second to last.
random brightness
random contrast (mode 0)
convert color from BGR to HSV
random saturation
random hue
convert color from HSV to BGR
random contrast (mode 1)
randomly swap channels
- Parameters
brightness_delta (int) – delta of brightness.
contrast_range (tuple) – range of contrast.
saturation_range (tuple) – range of saturation.
hue_delta (int) – delta of hue.
- class easycv.datasets.detection.pipelines.MMResize(img_scale=None, multiscale_mode='range', ratio_range=None, keep_ratio=True, bbox_clip_border=True, backend='cv2', override=False)[source]¶
Bases:
object
Resize images & bbox & mask.
This transform resizes the input image to some scale. Bboxes and masks are then resized with the same scale factor. If the input dict contains the key “scale”, then the scale in the input dict is used, otherwise the specified scale in the init method is used. If the input dict contains the key “scale_factor” (if MultiScaleFlipAug does not give img_scale but scale_factor), the actual scale will be computed by image shape and scale_factor.
img_scale can either be a tuple (single-scale) or a list of tuple (multi-scale). There are 3 multiscale modes:
ratio_range is not None
: randomly sample a ratio from the ratio range and multiply it with the image scale.ratio_range is None
andmultiscale_mode == "range"
: randomly sample a scale from the multiscale range.ratio_range is None
andmultiscale_mode == "value"
: randomly sample a scale from multiple scales.
- Parameters
img_scale (tuple or list[tuple]) – Images scales for resizing.
multiscale_mode (str) – Either “range” or “value”.
ratio_range (tuple[float]) – (min_ratio, max_ratio)
keep_ratio (bool) – Whether to keep the aspect ratio when resizing the image.
bbox_clip_border (bool, optional) – Whether clip the objects outside the border of the image. Defaults to True.
backend (str) – Image resize backend, choices are ‘cv2’ and ‘pillow’. These two backends generates slightly different results. Defaults to ‘cv2’.
override (bool, optional) – Whether to override scale and scale_factor so as to call resize twice. Default False. If True, after the first resizing, the existed scale and scale_factor will be ignored so the second resizing can be allowed. This option is a work-around for multiple times of resize in DETR. Defaults to False.
- __init__(img_scale=None, multiscale_mode='range', ratio_range=None, keep_ratio=True, bbox_clip_border=True, backend='cv2', override=False)[source]¶
Initialize self. See help(type(self)) for accurate signature.
- static random_select(img_scales)[source]¶
Randomly select an img_scale from given candidates.
- Parameters
img_scales (list[tuple]) – Images scales for selection.
- Returns
Returns a tuple
(img_scale, scale_dix)
, whereimg_scale
is the selected image scale andscale_idx
is the selected index in the given candidates.- Return type
(tuple, int)
- static random_sample(img_scales)[source]¶
Randomly sample an img_scale when
multiscale_mode=='range'
.- Parameters
img_scales (list[tuple]) – Images scale range for sampling. There must be two tuples in img_scales, which specify the lower and upper bound of image scales.
- Returns
Returns a tuple
(img_scale, None)
, whereimg_scale
is sampled scale and None is just a placeholder to be consistent withrandom_select()
.- Return type
(tuple, None)
- static random_sample_ratio(img_scale, ratio_range)[source]¶
Randomly sample an img_scale when
ratio_range
is specified.A ratio will be randomly sampled from the range specified by
ratio_range
. Then it would be multiplied withimg_scale
to generate sampled scale.- Parameters
img_scale (tuple) – Images scale base to multiply with ratio.
ratio_range (tuple[float]) – The minimum and maximum ratio to scale the
img_scale
.
- Returns
Returns a tuple
(scale, None)
, wherescale
is sampled ratio multiplied withimg_scale
and None is just a placeholder to be consistent withrandom_select()
.- Return type
(tuple, None)
- class easycv.datasets.detection.pipelines.MMRandomFlip(flip_ratio=None, direction='horizontal')[source]¶
Bases:
object
Flip the image & bbox & mask.
If the input dict contains the key “flip”, then the flag will be used, otherwise it will be randomly decided by a ratio specified in the init method.
When random flip is enabled,
flip_ratio
/direction
can either be a float/string or tuple of float/string. There are 3 flip modes:flip_ratio
is float,direction
is string: the image will bedirection``ly flipped with probability of ``flip_ratio
. E.g.,flip_ratio=0.5
,direction='horizontal'
, then image will be horizontally flipped with probability of 0.5.
flip_ratio
is float,direction
is list of string: the image wilbe
direction[i]``ly flipped with probability of ``flip_ratio/len(direction)
. E.g.,flip_ratio=0.5
,direction=['horizontal', 'vertical']
, then image will be horizontally flipped with probability of 0.25, vertically with probability of 0.25.
flip_ratio
is list of float,direction
is list of string:given
len(flip_ratio) == len(direction)
, the image wil bedirection[i]``ly flipped with probability of ``flip_ratio[i]
. E.g.,flip_ratio=[0.3, 0.5]
,direction=['horizontal', 'vertical']
, then image will be horizontally flipped with probability of 0.3, vertically with probability of 0.5.
- Parameters
flip_ratio (float | list[float], optional) – The flipping probability. Default: None.
direction (str | list[str], optional) – The flipping direction. Options are ‘horizontal’, ‘vertical’, ‘diagonal’. Default: ‘horizontal’. If input is a list, the length must equal
flip_ratio
. Each element inflip_ratio
indicates the flip probability of corresponding direction.
- __init__(flip_ratio=None, direction='horizontal')[source]¶
Initialize self. See help(type(self)) for accurate signature.
- bbox_flip(bboxes, img_shape, direction)[source]¶
Flip bboxes horizontally.
- Parameters
bboxes (numpy.ndarray) – Bounding boxes, shape (…, 4*k)
img_shape (tuple[int]) – Image shape (height, width)
direction (str) – Flip direction. Options are ‘horizontal’, ‘vertical’.
- Returns
Flipped bounding boxes.
- Return type
numpy.ndarray
- class easycv.datasets.detection.pipelines.MMPad(size=None, size_divisor=None, pad_to_square=False, pad_val=0)[source]¶
Bases:
object
Pad the image & mask.
There are two padding modes: (1) pad to a fixed size and (2) pad to the minimum size that is divisible by some number. Added keys are “pad_shape”, “pad_fixed_size”, “pad_size_divisor”,
- Parameters
size (tuple, optional) – Fixed padding size.
size_divisor (int, optional) – The divisor of padded size.
pad_to_square (bool) – Whether to pad the image into a square. Currently only used for YOLOX. Default: False.
pad_val (float, optional) – Padding value, 0 by default.
- class easycv.datasets.detection.pipelines.MMNormalize(mean, std, to_rgb=True)[source]¶
Bases:
object
Normalize the image.
Added key is “img_norm_cfg”.
- Parameters
mean (sequence) – Mean values of 3 channels.
std (sequence) – Std values of 3 channels.
to_rgb (bool) – Whether to convert the image from BGR to RGB, default is true.
- class easycv.datasets.detection.pipelines.LoadImageFromFile(to_float32=False, color_type='color', file_client_args={'backend': 'disk'})[source]¶
Bases:
object
Load an image from file.
Required keys are “img_prefix” and “img_info” (a dict that must contain the key “filename”). Added or updated keys are “filename”, “img”, “img_shape”, “ori_shape” (same as img_shape), “pad_shape” (same as img_shape), “scale_factor” (1.0) and “img_norm_cfg” (means=0 and stds=1).
- Parameters
to_float32 (bool) – Whether to convert the loaded image to a float32 numpy array. If set to False, the loaded image is an uint8 array. Defaults to False.
color_type (str) – The flag argument for
mmcv.imfrombytes()
. Defaults to ‘color’.file_client_args (dict) – Arguments to instantiate a FileClient. See
mmcv.fileio.FileClient
for details. Defaults todict(backend='disk')
.
- class easycv.datasets.detection.pipelines.LoadMultiChannelImageFromFiles(to_float32=False, color_type='unchanged', file_client_args={'backend': 'disk'})[source]¶
Bases:
object
Load multi-channel images from a list of separate channel files.
Required keys are “img_prefix” and “img_info” (a dict that must contain the key “filename”, which is expected to be a list of filenames). Added or updated keys are “filename”, “img”, “img_shape”, “ori_shape” (same as img_shape), “pad_shape” (same as img_shape), “scale_factor” (1.0) and “img_norm_cfg” (means=0 and stds=1).
- Parameters
to_float32 (bool) – Whether to convert the loaded image to a float32 numpy array. If set to False, the loaded image is an uint8 array. Defaults to False.
color_type (str) – The flag argument for
mmcv.imfrombytes()
. Defaults to ‘color’.file_client_args (dict) – Arguments to instantiate a FileClient. See
mmcv.fileio.FileClient
for details. Defaults todict(backend='disk')
.
- class easycv.datasets.detection.pipelines.LoadAnnotations(with_bbox=True, with_label=True, with_mask=False, with_seg=False, poly2mask=True, file_client_args={'backend': 'disk'})[source]¶
Bases:
object
Load multiple types of annotations.
- Parameters
with_bbox (bool) – Whether to parse and load the bbox annotation. Default: True.
with_label (bool) – Whether to parse and load the label annotation. Default: True.
with_mask (bool) – Whether to parse and load the mask annotation. Default: False.
with_seg (bool) – Whether to parse and load the semantic segmentation annotation. Default: False.
poly2mask (bool) – Whether to convert the instance masks from polygons to bitmaps. Default: True.
file_client_args (dict) – Arguments to instantiate a FileClient. See
mmcv.fileio.FileClient
for details. Defaults todict(backend='disk')
.
- class easycv.datasets.detection.pipelines.MMMultiScaleFlipAug(transforms, img_scale=None, scale_factor=None, flip=False, flip_direction='horizontal')[source]¶
Bases:
object
Test-time augmentation with multiple scales and flipping.
An example configuration is as followed:
img_scale=[(1333, 400), (1333, 800)], flip=True, transforms=[ dict(type='Resize', keep_ratio=True), dict(type='RandomFlip'), dict(type='Normalize', **img_norm_cfg), dict(type='Pad', size_divisor=32), dict(type='ImageToTensor', keys=['img']), dict(type='Collect', keys=['img']), ]
After MultiScaleFLipAug with above configuration, the results are wrapped into lists of the same length as followed:
dict( img=[...], img_shape=[...], scale=[(1333, 400), (1333, 400), (1333, 800), (1333, 800)] flip=[False, True, False, True] ... )
- Parameters
transforms (list[dict]) – Transforms to apply in each augmentation.
img_scale (tuple | list[tuple] | None) – Images scales for resizing.
scale_factor (float | list[float] | None) – Scale factors for resizing.
flip (bool) – Whether apply flip augmentation. Default: False.
flip_direction (str | list[str]) – Flip augmentation directions, options are “horizontal”, “vertical” and “diagonal”. If flip_direction is a list, multiple flip augmentations will be applied. It has no effect when flip == False. Default: “horizontal”.
- class easycv.datasets.detection.pipelines.mm_transforms.MMToTensor[source]¶
Bases:
object
Transform image to Tensor.
Required key: ‘img’. Modifies key: ‘img’.
- Parameters
results (dict) – contain all information about training.
- class easycv.datasets.detection.pipelines.mm_transforms.NormalizeTensor(mean, std)[source]¶
Bases:
object
Normalize the Tensor image (CxHxW), with mean and std.
Required key: ‘img’. Modifies key: ‘img’.
- Parameters
mean (list[float]) – Mean values of 3 channels.
std (list[float]) – Std values of 3 channels.
- class easycv.datasets.detection.pipelines.mm_transforms.MMMosaic(img_scale=(640, 640), center_ratio_range=(0.5, 1.5), pad_val=114)[source]¶
Bases:
object
Mosaic augmentation.
Given 4 images, mosaic transform combines them into one output image. The output image is composed of the parts from each sub- image.
mosaic transform center_x +------------------------------+ | pad | pad | | +-----------+ | | | | | | | image1 |--------+ | | | | | | | | | image2 | | center_y |----+-------------+-----------| | | cropped | | |pad | image3 | image4 | | | | | +----|-------------+-----------+ | | +-------------+ The mosaic transform steps are as follows: 1. Choose the mosaic center as the intersections of 4 images 2. Get the left top image according to the index, and randomly sample another 3 images from the custom dataset. 3. Sub image will be cropped if image is larger than mosaic patch
- Parameters
img_scale (Sequence[int]) – Image size after mosaic pipeline of single image. Default to (640, 640).
center_ratio_range (Sequence[float]) – Center ratio range of mosaic output. Default to (0.5, 1.5).
pad_val (int) – Pad value. Default to 114.
- class easycv.datasets.detection.pipelines.mm_transforms.MMMixUp(img_scale=(640, 640), ratio_range=(0.5, 1.5), flip_ratio=0.5, pad_val=114, max_iters=15, min_bbox_size=5, min_area_ratio=0.2, max_aspect_ratio=20)[source]¶
Bases:
object
MixUp data augmentation.
mixup transform +------------------------------+ | mixup image | | | +--------|--------+ | | | | | | |---------------+ | | | | | | | | image | | | | | | | | | | | |-----------------+ | | pad | +------------------------------+ The mixup transform steps are as follows:: 1. Another random image is picked by dataset and embedded in the top left patch(after padding and resizing) 2. The target of mixup transform is the weighted average of mixup image and origin image.
- Parameters
img_scale (Sequence[int]) – Image output size after mixup pipeline. Default: (640, 640).
ratio_range (Sequence[float]) – Scale ratio of mixup image. Default: (0.5, 1.5).
flip_ratio (float) – Horizontal flip ratio of mixup image. Default: 0.5.
pad_val (int) – Pad value. Default: 114.
max_iters (int) – The maximum number of iterations. If the number of iterations is greater than max_iters, but gt_bbox is still empty, then the iteration is terminated. Default: 15.
min_bbox_size (float) – Width and height threshold to filter bboxes. If the height or width of a box is smaller than this value, it will be removed. Default: 5.
min_area_ratio (float) – Threshold of area ratio between original bboxes and wrapped bboxes. If smaller than this value, the box will be removed. Default: 0.2.
max_aspect_ratio (float) – Aspect ratio of width and height threshold to filter bboxes. If max(h/w, w/h) larger than this value, the box will be removed. Default: 20.
- class easycv.datasets.detection.pipelines.mm_transforms.MMRandomAffine(max_rotate_degree=10.0, max_translate_ratio=0.1, scaling_ratio_range=(0.5, 1.5), max_shear_degree=2.0, border=(0, 0), border_val=(114, 114, 114), min_bbox_size=2, min_area_ratio=0.2, max_aspect_ratio=20)[source]¶
Bases:
object
Random affine transform data augmentation. for yolox
This operation randomly generates affine transform matrix which including rotation, translation, shear and scaling transforms.
- Parameters
max_rotate_degree (float) – Maximum degrees of rotation transform. Default: 10.
max_translate_ratio (float) – Maximum ratio of translation. Default: 0.1.
scaling_ratio_range (tuple[float]) – Min and max ratio of scaling transform. Default: (0.5, 1.5).
max_shear_degree (float) – Maximum degrees of shear transform. Default: 2.
border (tuple[int]) – Distance from height and width sides of input image to adjust output shape. Only used in mosaic dataset. Default: (0, 0).
border_val (tuple[int]) – Border padding values of 3 channels. Default: (114, 114, 114).
min_bbox_size (float) – Width and height threshold to filter bboxes. If the height or width of a box is smaller than this value, it will be removed. Default: 2.
min_area_ratio (float) – Threshold of area ratio between original bboxes and wrapped bboxes. If smaller than this value, the box will be removed. Default: 0.2.
max_aspect_ratio (float) – Aspect ratio of width and height threshold to filter bboxes. If max(h/w, w/h) larger than this value, the box will be removed.
- class easycv.datasets.detection.pipelines.mm_transforms.MMPhotoMetricDistortion(brightness_delta=32, contrast_range=(0.5, 1.5), saturation_range=(0.5, 1.5), hue_delta=18)[source]¶
Bases:
object
Apply photometric distortion to image sequentially, every transformation is applied with a probability of 0.5. The position of random contrast is in second or second to last.
random brightness
random contrast (mode 0)
convert color from BGR to HSV
random saturation
random hue
convert color from HSV to BGR
random contrast (mode 1)
randomly swap channels
- Parameters
brightness_delta (int) – delta of brightness.
contrast_range (tuple) – range of contrast.
saturation_range (tuple) – range of saturation.
hue_delta (int) – delta of hue.
- class easycv.datasets.detection.pipelines.mm_transforms.MMResize(img_scale=None, multiscale_mode='range', ratio_range=None, keep_ratio=True, bbox_clip_border=True, backend='cv2', override=False)[source]¶
Bases:
object
Resize images & bbox & mask.
This transform resizes the input image to some scale. Bboxes and masks are then resized with the same scale factor. If the input dict contains the key “scale”, then the scale in the input dict is used, otherwise the specified scale in the init method is used. If the input dict contains the key “scale_factor” (if MultiScaleFlipAug does not give img_scale but scale_factor), the actual scale will be computed by image shape and scale_factor.
img_scale can either be a tuple (single-scale) or a list of tuple (multi-scale). There are 3 multiscale modes:
ratio_range is not None
: randomly sample a ratio from the ratio range and multiply it with the image scale.ratio_range is None
andmultiscale_mode == "range"
: randomly sample a scale from the multiscale range.ratio_range is None
andmultiscale_mode == "value"
: randomly sample a scale from multiple scales.
- Parameters
img_scale (tuple or list[tuple]) – Images scales for resizing.
multiscale_mode (str) – Either “range” or “value”.
ratio_range (tuple[float]) – (min_ratio, max_ratio)
keep_ratio (bool) – Whether to keep the aspect ratio when resizing the image.
bbox_clip_border (bool, optional) – Whether clip the objects outside the border of the image. Defaults to True.
backend (str) – Image resize backend, choices are ‘cv2’ and ‘pillow’. These two backends generates slightly different results. Defaults to ‘cv2’.
override (bool, optional) – Whether to override scale and scale_factor so as to call resize twice. Default False. If True, after the first resizing, the existed scale and scale_factor will be ignored so the second resizing can be allowed. This option is a work-around for multiple times of resize in DETR. Defaults to False.
- __init__(img_scale=None, multiscale_mode='range', ratio_range=None, keep_ratio=True, bbox_clip_border=True, backend='cv2', override=False)[source]¶
Initialize self. See help(type(self)) for accurate signature.
- static random_select(img_scales)[source]¶
Randomly select an img_scale from given candidates.
- Parameters
img_scales (list[tuple]) – Images scales for selection.
- Returns
Returns a tuple
(img_scale, scale_dix)
, whereimg_scale
is the selected image scale andscale_idx
is the selected index in the given candidates.- Return type
(tuple, int)
- static random_sample(img_scales)[source]¶
Randomly sample an img_scale when
multiscale_mode=='range'
.- Parameters
img_scales (list[tuple]) – Images scale range for sampling. There must be two tuples in img_scales, which specify the lower and upper bound of image scales.
- Returns
Returns a tuple
(img_scale, None)
, whereimg_scale
is sampled scale and None is just a placeholder to be consistent withrandom_select()
.- Return type
(tuple, None)
- static random_sample_ratio(img_scale, ratio_range)[source]¶
Randomly sample an img_scale when
ratio_range
is specified.A ratio will be randomly sampled from the range specified by
ratio_range
. Then it would be multiplied withimg_scale
to generate sampled scale.- Parameters
img_scale (tuple) – Images scale base to multiply with ratio.
ratio_range (tuple[float]) – The minimum and maximum ratio to scale the
img_scale
.
- Returns
Returns a tuple
(scale, None)
, wherescale
is sampled ratio multiplied withimg_scale
and None is just a placeholder to be consistent withrandom_select()
.- Return type
(tuple, None)
- class easycv.datasets.detection.pipelines.mm_transforms.MMRandomFlip(flip_ratio=None, direction='horizontal')[source]¶
Bases:
object
Flip the image & bbox & mask.
If the input dict contains the key “flip”, then the flag will be used, otherwise it will be randomly decided by a ratio specified in the init method.
When random flip is enabled,
flip_ratio
/direction
can either be a float/string or tuple of float/string. There are 3 flip modes:flip_ratio
is float,direction
is string: the image will bedirection``ly flipped with probability of ``flip_ratio
. E.g.,flip_ratio=0.5
,direction='horizontal'
, then image will be horizontally flipped with probability of 0.5.
flip_ratio
is float,direction
is list of string: the image wilbe
direction[i]``ly flipped with probability of ``flip_ratio/len(direction)
. E.g.,flip_ratio=0.5
,direction=['horizontal', 'vertical']
, then image will be horizontally flipped with probability of 0.25, vertically with probability of 0.25.
flip_ratio
is list of float,direction
is list of string:given
len(flip_ratio) == len(direction)
, the image wil bedirection[i]``ly flipped with probability of ``flip_ratio[i]
. E.g.,flip_ratio=[0.3, 0.5]
,direction=['horizontal', 'vertical']
, then image will be horizontally flipped with probability of 0.3, vertically with probability of 0.5.
- Parameters
flip_ratio (float | list[float], optional) – The flipping probability. Default: None.
direction (str | list[str], optional) – The flipping direction. Options are ‘horizontal’, ‘vertical’, ‘diagonal’. Default: ‘horizontal’. If input is a list, the length must equal
flip_ratio
. Each element inflip_ratio
indicates the flip probability of corresponding direction.
- __init__(flip_ratio=None, direction='horizontal')[source]¶
Initialize self. See help(type(self)) for accurate signature.
- bbox_flip(bboxes, img_shape, direction)[source]¶
Flip bboxes horizontally.
- Parameters
bboxes (numpy.ndarray) – Bounding boxes, shape (…, 4*k)
img_shape (tuple[int]) – Image shape (height, width)
direction (str) – Flip direction. Options are ‘horizontal’, ‘vertical’.
- Returns
Flipped bounding boxes.
- Return type
numpy.ndarray
- class easycv.datasets.detection.pipelines.mm_transforms.MMPad(size=None, size_divisor=None, pad_to_square=False, pad_val=0)[source]¶
Bases:
object
Pad the image & mask.
There are two padding modes: (1) pad to a fixed size and (2) pad to the minimum size that is divisible by some number. Added keys are “pad_shape”, “pad_fixed_size”, “pad_size_divisor”,
- Parameters
size (tuple, optional) – Fixed padding size.
size_divisor (int, optional) – The divisor of padded size.
pad_to_square (bool) – Whether to pad the image into a square. Currently only used for YOLOX. Default: False.
pad_val (float, optional) – Padding value, 0 by default.
- class easycv.datasets.detection.pipelines.mm_transforms.MMNormalize(mean, std, to_rgb=True)[source]¶
Bases:
object
Normalize the image.
Added key is “img_norm_cfg”.
- Parameters
mean (sequence) – Mean values of 3 channels.
std (sequence) – Std values of 3 channels.
to_rgb (bool) – Whether to convert the image from BGR to RGB, default is true.
- class easycv.datasets.detection.pipelines.mm_transforms.LoadImageFromFile(to_float32=False, color_type='color', file_client_args={'backend': 'disk'})[source]¶
Bases:
object
Load an image from file.
Required keys are “img_prefix” and “img_info” (a dict that must contain the key “filename”). Added or updated keys are “filename”, “img”, “img_shape”, “ori_shape” (same as img_shape), “pad_shape” (same as img_shape), “scale_factor” (1.0) and “img_norm_cfg” (means=0 and stds=1).
- Parameters
to_float32 (bool) – Whether to convert the loaded image to a float32 numpy array. If set to False, the loaded image is an uint8 array. Defaults to False.
color_type (str) – The flag argument for
mmcv.imfrombytes()
. Defaults to ‘color’.file_client_args (dict) – Arguments to instantiate a FileClient. See
mmcv.fileio.FileClient
for details. Defaults todict(backend='disk')
.
- class easycv.datasets.detection.pipelines.mm_transforms.LoadMultiChannelImageFromFiles(to_float32=False, color_type='unchanged', file_client_args={'backend': 'disk'})[source]¶
Bases:
object
Load multi-channel images from a list of separate channel files.
Required keys are “img_prefix” and “img_info” (a dict that must contain the key “filename”, which is expected to be a list of filenames). Added or updated keys are “filename”, “img”, “img_shape”, “ori_shape” (same as img_shape), “pad_shape” (same as img_shape), “scale_factor” (1.0) and “img_norm_cfg” (means=0 and stds=1).
- Parameters
to_float32 (bool) – Whether to convert the loaded image to a float32 numpy array. If set to False, the loaded image is an uint8 array. Defaults to False.
color_type (str) – The flag argument for
mmcv.imfrombytes()
. Defaults to ‘color’.file_client_args (dict) – Arguments to instantiate a FileClient. See
mmcv.fileio.FileClient
for details. Defaults todict(backend='disk')
.
- class easycv.datasets.detection.pipelines.mm_transforms.LoadAnnotations(with_bbox=True, with_label=True, with_mask=False, with_seg=False, poly2mask=True, file_client_args={'backend': 'disk'})[source]¶
Bases:
object
Load multiple types of annotations.
- Parameters
with_bbox (bool) – Whether to parse and load the bbox annotation. Default: True.
with_label (bool) – Whether to parse and load the label annotation. Default: True.
with_mask (bool) – Whether to parse and load the mask annotation. Default: False.
with_seg (bool) – Whether to parse and load the semantic segmentation annotation. Default: False.
poly2mask (bool) – Whether to convert the instance masks from polygons to bitmaps. Default: True.
file_client_args (dict) – Arguments to instantiate a FileClient. See
mmcv.fileio.FileClient
for details. Defaults todict(backend='disk')
.
- class easycv.datasets.detection.pipelines.mm_transforms.MMMultiScaleFlipAug(transforms, img_scale=None, scale_factor=None, flip=False, flip_direction='horizontal')[source]¶
Bases:
object
Test-time augmentation with multiple scales and flipping.
An example configuration is as followed:
img_scale=[(1333, 400), (1333, 800)], flip=True, transforms=[ dict(type='Resize', keep_ratio=True), dict(type='RandomFlip'), dict(type='Normalize', **img_norm_cfg), dict(type='Pad', size_divisor=32), dict(type='ImageToTensor', keys=['img']), dict(type='Collect', keys=['img']), ]
After MultiScaleFLipAug with above configuration, the results are wrapped into lists of the same length as followed:
dict( img=[...], img_shape=[...], scale=[(1333, 400), (1333, 400), (1333, 800), (1333, 800)] flip=[False, True, False, True] ... )
- Parameters
transforms (list[dict]) – Transforms to apply in each augmentation.
img_scale (tuple | list[tuple] | None) – Images scales for resizing.
scale_factor (float | list[float] | None) – Scale factors for resizing.
flip (bool) – Whether apply flip augmentation. Default: False.
flip_direction (str | list[str]) – Flip augmentation directions, options are “horizontal”, “vertical” and “diagonal”. If flip_direction is a list, multiple flip augmentations will be applied. It has no effect when flip == False. Default: “horizontal”.
Submodules¶
easycv.datasets.detection.mix module¶
- class easycv.datasets.detection.mix.DetImagesMixDataset(data_source, pipeline, dynamic_scale=None, skip_type_keys=None, profiling=False, classes=None, yolo_format=True, label_padding=True)[source]¶
Bases:
Generic
[torch.utils.data.dataset.T_co
]A wrapper of multiple images mixed dataset.
Suitable for training on multiple images mixed data augmentation like mosaic and mixup. For the augmentation pipeline of mixed image data, the get_indexes method needs to be provided to obtain the image indexes, and you can set skip_flags to change the pipeline running process. At the same time, we provide the dynamic_scale parameter to dynamically change the output image size.
output boxes format: cx, cy, w, h
- Parameters
data_source (
DetSourceCoco
) – The dataset to be mixed.pipeline (Sequence[dict]) – Sequence of transform object or config dict to be composed.
dynamic_scale (tuple[int], optional) – The image scale can be changed dynamically. Default to None.
skip_type_keys (list[str], optional) – Sequence of type string to be skip pipeline. Default to None.
label_padding – out labeling padding [N, 120, 5]
- __init__(data_source, pipeline, dynamic_scale=None, skip_type_keys=None, profiling=False, classes=None, yolo_format=True, label_padding=True)[source]¶
Args: data_source: Data_source config dict pipeline: Pipeline config list profiling: If set True, will print pipeline time classes: A list of class names, used in evaluation for result and groundtruth visualization
- update_skip_type_keys(skip_type_keys)[source]¶
Update skip_type_keys. It is called by an external hook.
- Parameters
skip_type_keys (list[str], optional) – Sequence of type string to be skip pipeline.
- update_dynamic_scale(dynamic_scale)[source]¶
Update dynamic_scale. It is called by an external hook.
- Parameters
dynamic_scale (tuple[int]) – The image scale can be changed dynamically.
- results2json(results, outfile_prefix)[source]¶
Dump the detection results to a COCO style json file.
There are 3 types of results: proposals, bbox predictions, mask predictions, and they have different data types. This method will automatically recognize the type, and dump them to json files.
- Parameters
results (list[list | tuple | ndarray]) – Testing results of the dataset.
outfile_prefix (str) – The filename prefix of the json files. If the prefix is “somepath/xxx”, the json files will be named “somepath/xxx.bbox.json”, “somepath/xxx.segm.json”, “somepath/xxx.proposal.json”.
- Returns
str]: Possible keys are “bbox”, “segm”, “proposal”, and values are corresponding filenames.
- Return type
dict[str
- format_results(results, jsonfile_prefix=None, **kwargs)[source]¶
Format the results to json (standard format for COCO evaluation).
- Parameters
results (list[tuple | numpy.ndarray]) – Testing results of the dataset.
jsonfile_prefix (str | None) – The prefix of json files. It includes the file path and the prefix of filename, e.g., “a/b/prefix”. If not specified, a temp file will be created. Default: None.
- Returns
(result_files, tmp_dir), result_files is a dict containing the json filepaths, tmp_dir is the temporal directory created for saving json files when jsonfile_prefix is not specified.
- Return type
tuple
easycv.datasets.detection.raw module¶
- class easycv.datasets.detection.raw.DetDataset(data_source, pipeline, profiling=False, classes=None)[source]¶
Bases:
Generic
[torch.utils.data.dataset.T_co
]Dataset for Detection
- __init__(data_source, pipeline, profiling=False, classes=None)[source]¶
- Parameters
data_source – Data_source config dict
pipeline – Pipeline config list
profiling – If set True, will print pipeline time
classes – A list of class names, used in evaluation for result and groundtruth visualization
- evaluate(results, evaluators=None, logger=None)[source]¶
Evaluates the detection boxes. :param results: A dictionary containing
- detection_boxes: List of length number of test images.
Float32 numpy array of shape [num_boxes, 4] and format [ymin, xmin, ymax, xmax] in absolute image coordinates.
- detection_scores: List of length number of test images,
detection scores for the boxes, float32 numpy array of shape [num_boxes].
- detection_classes: List of length number of test images,
integer numpy array of shape [num_boxes] containing 1-indexed detection classes for the boxes.
- img_metas: List of length number of test images,
dict of image meta info, containing filename, img_shape, origin_img_shape, scale_factor and so on.
- Parameters
evaluators – evaluators to calculate metric with results and groundtruth_dict
- visualize(results, vis_num=10, score_thr=0.3, **kwargs)[source]¶
Visulaize the model output on validation data. :param results: A dictionary containing
- detection_boxes: List of length number of test images.
Float32 numpy array of shape [num_boxes, 4] and format [ymin, xmin, ymax, xmax] in absolute image coordinates.
- detection_scores: List of length number of test images,
detection scores for the boxes, float32 numpy array of shape [num_boxes].
- detection_classes: List of length number of test images,
integer numpy array of shape [num_boxes] containing 1-indexed detection classes for the boxes.
- img_metas: List of length number of test images,
dict of image meta info, containing filename, img_shape, origin_img_shape, scale_factor and so on.
- Parameters
vis_num – number of images visualized
score_thr – The threshold to filter box, boxes with scores greater than score_thr will be kept.
- Returns: A dictionary containing
images: Visulaized images. img_metas: List of length number of test images,
dict of image meta info, containing filename, img_shape, origin_img_shape, scale_factor and so on.
easycv.datasets.loader package¶
- class easycv.datasets.loader.GroupSampler(dataset, samples_per_gpu=1)[source]¶
Bases:
Generic
[torch.utils.data.sampler.T_co
]
- class easycv.datasets.loader.DistributedGroupSampler(dataset, samples_per_gpu=1, num_replicas=None, rank=None)[source]¶
Bases:
Generic
[torch.utils.data.sampler.T_co
]Sampler that restricts data loading to a subset of the dataset.
It is especially useful in conjunction with
torch.nn.parallel.DistributedDataParallel
. In such case, each process can pass a DistributedSampler instance as a DataLoader sampler, and load a subset of the original dataset that is exclusive to it. .. note:Dataset is assumed to be of constant size.
- Parameters
dataset – Dataset used for sampling.
num_replicas (optional) – Number of processes participating in distributed training.
rank (optional) – Rank of the current process within num_replicas.
- easycv.datasets.loader.build_dataloader(dataset, imgs_per_gpu, workers_per_gpu, num_gpus=1, dist=True, shuffle=True, replace=False, seed=None, reuse_worker_cache=False, odps_config=None, persistent_workers=False, **kwargs)[source]¶
Build PyTorch DataLoader.
In distributed training, each GPU/process has a dataloader. In non-distributed training, there is only one dataloader for all GPUs.
- Parameters
dataset (Dataset) – A PyTorch dataset.
imgs_per_gpu (int) – Number of images on each GPU, i.e., batch size of each GPU.
workers_per_gpu (int) – How many subprocesses to use for data loading for each GPU.
num_gpus (int) – Number of GPUs. Only used in non-distributed training.
dist (bool) – Distributed training/test or not. Default: True.
shuffle (bool) – Whether to shuffle the data at every epoch. Default: True.
replace (bool) – Replace or not in random shuffle. It works on when shuffle is True.
reuse_worker_cache (bool) – If set true, will reuse worker process so that cached data in worker process can be reused.
persistent_workers (bool) – After pytorch1.7, could use persistent_workers=True to avoid reconstruct dataworker before each epoch, speed up before epoch
kwargs – any keyword argument to be used to initialize DataLoader
- Returns
A PyTorch dataloader.
- Return type
DataLoader
- class easycv.datasets.loader.DistributedGivenIterationSampler(dataset, total_iter, batch_size, num_replicas=None, rank=None, last_iter=- 1)[source]¶
Bases:
Generic
[torch.utils.data.sampler.T_co
]
Submodules¶
easycv.datasets.loader.build_loader module¶
- easycv.datasets.loader.build_loader.build_dataloader(dataset, imgs_per_gpu, workers_per_gpu, num_gpus=1, dist=True, shuffle=True, replace=False, seed=None, reuse_worker_cache=False, odps_config=None, persistent_workers=False, **kwargs)[source]¶
Build PyTorch DataLoader.
In distributed training, each GPU/process has a dataloader. In non-distributed training, there is only one dataloader for all GPUs.
- Parameters
dataset (Dataset) – A PyTorch dataset.
imgs_per_gpu (int) – Number of images on each GPU, i.e., batch size of each GPU.
workers_per_gpu (int) – How many subprocesses to use for data loading for each GPU.
num_gpus (int) – Number of GPUs. Only used in non-distributed training.
dist (bool) – Distributed training/test or not. Default: True.
shuffle (bool) – Whether to shuffle the data at every epoch. Default: True.
replace (bool) – Replace or not in random shuffle. It works on when shuffle is True.
reuse_worker_cache (bool) – If set true, will reuse worker process so that cached data in worker process can be reused.
persistent_workers (bool) – After pytorch1.7, could use persistent_workers=True to avoid reconstruct dataworker before each epoch, speed up before epoch
kwargs – any keyword argument to be used to initialize DataLoader
- Returns
A PyTorch dataloader.
- Return type
DataLoader
- class easycv.datasets.loader.build_loader.InfiniteDataLoader(*args, **kwargs)[source]¶
Bases:
Generic
[torch.utils.data.dataloader.T_co
]Dataloader that reuses workers. https://github.com/pytorch/pytorch/issues/15849
Uses same syntax as vanilla DataLoader.
- dataset: torch.utils.data.dataset.Dataset[torch.utils.data.dataloader.T_co]¶
- batch_size: Optional[int]¶
- num_workers: int¶
- pin_memory: bool¶
- drop_last: bool¶
- timeout: float¶
- sampler: Union[torch.utils.data.sampler.Sampler, Iterable]¶
- prefetch_factor: int¶
easycv.datasets.loader.sampler module¶
- class easycv.datasets.loader.sampler.DistributedMPSampler(dataset, num_replicas=None, rank=None, shuffle=True, split_huge_listfile_byrank=False)[source]¶
Bases:
torch.utils.data.sampler.Sampler
[torch.utils.data.distributed.T_co
]- __init__(dataset, num_replicas=None, rank=None, shuffle=True, split_huge_listfile_byrank=False)[source]¶
A Distribute sampler which support sample m instance from one class once for classification dataset
dataset: pytorch dataset object num_replicas (optional): Number of processes participating in
distributed training.
rank (optional): Rank of the current process within num_replicas. shuffle (optional): If true (default), sampler will shuffle the indices split_huge_listfile_byrank: if split, return all indice for each rank, because list for each rank has been
split before build dataset in dist training
- class easycv.datasets.loader.sampler.DistributedSampler(dataset, num_replicas=None, rank=None, shuffle=True, replace=False, split_huge_listfile_byrank=False)[source]¶
Bases:
torch.utils.data.sampler.Sampler
[torch.utils.data.distributed.T_co
]- __init__(dataset, num_replicas=None, rank=None, shuffle=True, replace=False, split_huge_listfile_byrank=False)[source]¶
A Distribute sampler which support sample m instance from one class once for classification dataset
- Parameters
dataset – pytorch dataset object
num_replicas (optional) – Number of processes participating in distributed training.
rank (optional) – Rank of the current process within num_replicas.
shuffle (optional) – If true (default), sampler will shuffle the indices
split_huge_listfile_byrank – if split, return all indice for each rank, because list for each rank has been split before build dataset in dist training
- class easycv.datasets.loader.sampler.GroupSampler(dataset, samples_per_gpu=1)[source]¶
Bases:
Generic
[torch.utils.data.sampler.T_co
]
- class easycv.datasets.loader.sampler.DistributedGroupSampler(dataset, samples_per_gpu=1, num_replicas=None, rank=None)[source]¶
Bases:
Generic
[torch.utils.data.sampler.T_co
]Sampler that restricts data loading to a subset of the dataset.
It is especially useful in conjunction with
torch.nn.parallel.DistributedDataParallel
. In such case, each process can pass a DistributedSampler instance as a DataLoader sampler, and load a subset of the original dataset that is exclusive to it. .. note:Dataset is assumed to be of constant size.
- Parameters
dataset – Dataset used for sampling.
num_replicas (optional) – Number of processes participating in distributed training.
rank (optional) – Rank of the current process within num_replicas.
- class easycv.datasets.loader.sampler.DistributedGivenIterationSampler(dataset, total_iter, batch_size, num_replicas=None, rank=None, last_iter=- 1)[source]¶
Bases:
Generic
[torch.utils.data.sampler.T_co
]
easycv.datasets.pose package¶
- class easycv.datasets.pose.PoseTopDownDataset(data_source, pipeline, profiling=False)[source]¶
Bases:
Generic
[torch.utils.data.dataset.T_co
]PoseTopDownDataset dataset for top-down pose estimation. The dataset loads raw features and apply specified transforms to return a dict containing the image tensors and other information.
- Parameters
data_source – Data_source config dict
pipeline – Pipeline config list
profiling – If set True, will print pipeline time
Subpackages¶
easycv.datasets.pose.data_sources package¶
- class easycv.datasets.pose.data_sources.PoseTopDownSourceCoco(ann_file, img_prefix, data_cfg, dataset_info=None, test_mode=False)[source]¶
Bases:
easycv.datasets.pose.data_sources.top_down.PoseTopDownSource
CocoSource for top-down pose estimation.
Microsoft COCO: Common Objects in Context’ ECCV’2014 More details can be found in the `paper .
The source loads raw features to build a data meta object containing the image info, annotation info and others.
COCO keypoint indexes:
0: 'nose', 1: 'left_eye', 2: 'right_eye', 3: 'left_ear', 4: 'right_ear', 5: 'left_shoulder', 6: 'right_shoulder', 7: 'left_elbow', 8: 'right_elbow', 9: 'left_wrist', 10: 'right_wrist', 11: 'left_hip', 12: 'right_hip', 13: 'left_knee', 14: 'right_knee', 15: 'left_ankle', 16: 'right_ankle'
- Parameters
ann_file (str) – Path to the annotation file.
img_prefix (str) – Path to a directory where images are held. Default: None.
data_cfg (dict) – config
dataset_info (DatasetInfo) – A class containing all dataset info.
test_mode (bool) – Store True when building test or
dataset. Default (validation) – False.
- class easycv.datasets.pose.data_sources.PoseTopDownSource(ann_file, img_prefix, data_cfg, dataset_info, coco_style=True, test_mode=False)[source]¶
Bases:
object
Class for keypoint 2D top-down pose estimation with single-view RGB image as the data source.
- Parameters
ann_file (str) – Path to the annotation file.
img_prefix (str) – Path to a directory where images are held. Default: None.
data_cfg (dict) – config
dataset_info (DatasetInfo) – A class containing all dataset info.
coco_style (bool) – Whether the annotation json is coco-style. Default: True
test_mode (bool) – Store True when building test or validation dataset. Default: False.
- class easycv.datasets.pose.data_sources.coco.PoseTopDownSourceCoco(ann_file, img_prefix, data_cfg, dataset_info=None, test_mode=False)[source]¶
Bases:
easycv.datasets.pose.data_sources.top_down.PoseTopDownSource
CocoSource for top-down pose estimation.
Microsoft COCO: Common Objects in Context’ ECCV’2014 More details can be found in the `paper .
The source loads raw features to build a data meta object containing the image info, annotation info and others.
COCO keypoint indexes:
0: 'nose', 1: 'left_eye', 2: 'right_eye', 3: 'left_ear', 4: 'right_ear', 5: 'left_shoulder', 6: 'right_shoulder', 7: 'left_elbow', 8: 'right_elbow', 9: 'left_wrist', 10: 'right_wrist', 11: 'left_hip', 12: 'right_hip', 13: 'left_knee', 14: 'right_knee', 15: 'left_ankle', 16: 'right_ankle'
- Parameters
ann_file (str) – Path to the annotation file.
img_prefix (str) – Path to a directory where images are held. Default: None.
data_cfg (dict) – config
dataset_info (DatasetInfo) – A class containing all dataset info.
test_mode (bool) – Store True when building test or
dataset. Default (validation) – False.
- class easycv.datasets.pose.data_sources.top_down.PoseTopDownSource(ann_file, img_prefix, data_cfg, dataset_info, coco_style=True, test_mode=False)[source]¶
Bases:
object
Class for keypoint 2D top-down pose estimation with single-view RGB image as the data source.
- Parameters
ann_file (str) – Path to the annotation file.
img_prefix (str) – Path to a directory where images are held. Default: None.
data_cfg (dict) – config
dataset_info (DatasetInfo) – A class containing all dataset info.
coco_style (bool) – Whether the annotation json is coco-style. Default: True
test_mode (bool) – Store True when building test or validation dataset. Default: False.
easycv.datasets.pose.pipelines package¶
- class easycv.datasets.pose.pipelines.PoseCollect(keys, meta_keys, meta_name='img_metas')[source]¶
Bases:
object
Collect data from the loader relevant to the specific task.
This keeps the items in keys as it is, and collect items in meta_keys into a meta item called meta_name.This is usually the last stage of the data loader pipeline. For example, when keys=’imgs’, meta_keys=(‘filename’, ‘label’, ‘original_shape’), meta_name=’img_metas’, the results will be a dict with keys ‘imgs’ and ‘img_metas’, where ‘img_metas’ is a DataContainer of another dict with keys ‘filename’, ‘label’, ‘original_shape’.
- Parameters
keys (Sequence[str|tuple]) – Required keys to be collected. If a tuple (key, key_new) is given as an element, the item retrieved by key will be renamed as key_new in collected data.
meta_name (str) – The name of the key that contains meta information. This key is always populated. Default: “img_metas”.
meta_keys (Sequence[str|tuple]) – Keys that are collected under meta_name. The contents of the meta_name dictionary depends on meta_keys.
- class easycv.datasets.pose.pipelines.TopDownRandomFlip(flip_prob=0.5)[source]¶
Bases:
object
Data augmentation with random image flip.
Required keys: ‘img’, ‘joints_3d’, ‘joints_3d_visible’, ‘center’ and ‘ann_info’. Modifies key: ‘img’, ‘joints_3d’, ‘joints_3d_visible’, ‘center’ and ‘flipped’.
- Parameters
flip (bool) – Option to perform random flip.
flip_prob (float) – Probability of flip.
- class easycv.datasets.pose.pipelines.TopDownHalfBodyTransform(num_joints_half_body=8, prob_half_body=0.3)[source]¶
Bases:
object
Data augmentation with half-body transform. Keep only the upper body or the lower body at random.
Required keys: ‘joints_3d’, ‘joints_3d_visible’, and ‘ann_info’. Modifies key: ‘scale’ and ‘center’.
- Parameters
num_joints_half_body (int) – Threshold of performing half-body transform. If the body has fewer number of joints (< num_joints_half_body), ignore this step.
prob_half_body (float) – Probability of half-body transform.
- class easycv.datasets.pose.pipelines.TopDownGetRandomScaleRotation(rot_factor=40, scale_factor=0.5, rot_prob=0.6)[source]¶
Bases:
object
Data augmentation with random scaling & rotating.
Required key: ‘scale’. Modifies key: ‘scale’ and ‘rotation’.
- Parameters
rot_factor (int) – Rotating to
[-2*rot_factor, 2*rot_factor]
.scale_factor (float) – Scaling to
[1-scale_factor, 1+scale_factor]
.rot_prob (float) – Probability of random rotation.
- class easycv.datasets.pose.pipelines.TopDownAffine(use_udp=False)[source]¶
Bases:
object
Affine transform the image to make input.
Required keys:’img’, ‘joints_3d’, ‘joints_3d_visible’, ‘ann_info’,’scale’, ‘rotation’ and ‘center’. Modified keys:’img’, ‘joints_3d’, and ‘joints_3d_visible’.
- Parameters
use_udp (bool) – To use unbiased data processing. Paper ref: Huang et al. The Devil is in the Details: Delving into Unbiased Data Processing for Human Pose Estimation (CVPR 2020).
- class easycv.datasets.pose.pipelines.TopDownGenerateTarget(sigma=2, kernel=(11, 11), valid_radius_factor=0.0546875, target_type='GaussianHeatmap', encoding='MSRA', unbiased_encoding=False)[source]¶
Bases:
object
Generate the target heatmap.
Required keys: ‘joints_3d’, ‘joints_3d_visible’, ‘ann_info’. Modified keys: ‘target’, and ‘target_weight’.
- Parameters
sigma – Sigma of heatmap gaussian for ‘MSRA’ approach.
kernel – Kernel of heatmap gaussian for ‘Megvii’ approach.
encoding (str) – Approach to generate target heatmaps. Currently supported approaches: ‘MSRA’, ‘Megvii’, ‘UDP’. Default:’MSRA’
unbiased_encoding (bool) – Option to use unbiased encoding methods. Paper ref: Zhang et al. Distribution-Aware Coordinate Representation for Human Pose Estimation (CVPR 2020).
keypoint_pose_distance – Keypoint pose distance for UDP. Paper ref: Huang et al. The Devil is in the Details: Delving into Unbiased Data Processing for Human Pose Estimation (CVPR 2020).
target_type (str) – supported targets: ‘GaussianHeatmap’, ‘CombinedTarget’. Default:’GaussianHeatmap’ CombinedTarget: The combination of classification target (response map) and regression target (offset map). Paper ref: Huang et al. The Devil is in the Details: Delving into Unbiased Data Processing for Human Pose Estimation (CVPR 2020).
- class easycv.datasets.pose.pipelines.TopDownGenerateTargetRegression[source]¶
Bases:
object
Generate the target regression vector (coordinates).
Required keys: ‘joints_3d’, ‘joints_3d_visible’, ‘ann_info’. Modified keys: ‘target’, and ‘target_weight’.
- class easycv.datasets.pose.pipelines.TopDownRandomTranslation(trans_factor=0.15, trans_prob=1.0)[source]¶
Bases:
object
Data augmentation with random translation.
Required key: ‘scale’ and ‘center’. Modifies key: ‘center’.
Notes
bbox height: H bbox width: W
- Parameters
trans_factor (float) – Translating center to
[-trans_factor, trans_factor] * [W, H] + center
.trans_prob (float) – Probability of random translation.
- class easycv.datasets.pose.pipelines.transforms.PoseCollect(keys, meta_keys, meta_name='img_metas')[source]¶
Bases:
object
Collect data from the loader relevant to the specific task.
This keeps the items in keys as it is, and collect items in meta_keys into a meta item called meta_name.This is usually the last stage of the data loader pipeline. For example, when keys=’imgs’, meta_keys=(‘filename’, ‘label’, ‘original_shape’), meta_name=’img_metas’, the results will be a dict with keys ‘imgs’ and ‘img_metas’, where ‘img_metas’ is a DataContainer of another dict with keys ‘filename’, ‘label’, ‘original_shape’.
- Parameters
keys (Sequence[str|tuple]) – Required keys to be collected. If a tuple (key, key_new) is given as an element, the item retrieved by key will be renamed as key_new in collected data.
meta_name (str) – The name of the key that contains meta information. This key is always populated. Default: “img_metas”.
meta_keys (Sequence[str|tuple]) – Keys that are collected under meta_name. The contents of the meta_name dictionary depends on meta_keys.
- class easycv.datasets.pose.pipelines.transforms.TopDownRandomFlip(flip_prob=0.5)[source]¶
Bases:
object
Data augmentation with random image flip.
Required keys: ‘img’, ‘joints_3d’, ‘joints_3d_visible’, ‘center’ and ‘ann_info’. Modifies key: ‘img’, ‘joints_3d’, ‘joints_3d_visible’, ‘center’ and ‘flipped’.
- Parameters
flip (bool) – Option to perform random flip.
flip_prob (float) – Probability of flip.
- class easycv.datasets.pose.pipelines.transforms.TopDownHalfBodyTransform(num_joints_half_body=8, prob_half_body=0.3)[source]¶
Bases:
object
Data augmentation with half-body transform. Keep only the upper body or the lower body at random.
Required keys: ‘joints_3d’, ‘joints_3d_visible’, and ‘ann_info’. Modifies key: ‘scale’ and ‘center’.
- Parameters
num_joints_half_body (int) – Threshold of performing half-body transform. If the body has fewer number of joints (< num_joints_half_body), ignore this step.
prob_half_body (float) – Probability of half-body transform.
- class easycv.datasets.pose.pipelines.transforms.TopDownGetRandomScaleRotation(rot_factor=40, scale_factor=0.5, rot_prob=0.6)[source]¶
Bases:
object
Data augmentation with random scaling & rotating.
Required key: ‘scale’. Modifies key: ‘scale’ and ‘rotation’.
- Parameters
rot_factor (int) – Rotating to
[-2*rot_factor, 2*rot_factor]
.scale_factor (float) – Scaling to
[1-scale_factor, 1+scale_factor]
.rot_prob (float) – Probability of random rotation.
- class easycv.datasets.pose.pipelines.transforms.TopDownAffine(use_udp=False)[source]¶
Bases:
object
Affine transform the image to make input.
Required keys:’img’, ‘joints_3d’, ‘joints_3d_visible’, ‘ann_info’,’scale’, ‘rotation’ and ‘center’. Modified keys:’img’, ‘joints_3d’, and ‘joints_3d_visible’.
- Parameters
use_udp (bool) – To use unbiased data processing. Paper ref: Huang et al. The Devil is in the Details: Delving into Unbiased Data Processing for Human Pose Estimation (CVPR 2020).
- class easycv.datasets.pose.pipelines.transforms.TopDownGenerateTarget(sigma=2, kernel=(11, 11), valid_radius_factor=0.0546875, target_type='GaussianHeatmap', encoding='MSRA', unbiased_encoding=False)[source]¶
Bases:
object
Generate the target heatmap.
Required keys: ‘joints_3d’, ‘joints_3d_visible’, ‘ann_info’. Modified keys: ‘target’, and ‘target_weight’.
- Parameters
sigma – Sigma of heatmap gaussian for ‘MSRA’ approach.
kernel – Kernel of heatmap gaussian for ‘Megvii’ approach.
encoding (str) – Approach to generate target heatmaps. Currently supported approaches: ‘MSRA’, ‘Megvii’, ‘UDP’. Default:’MSRA’
unbiased_encoding (bool) – Option to use unbiased encoding methods. Paper ref: Zhang et al. Distribution-Aware Coordinate Representation for Human Pose Estimation (CVPR 2020).
keypoint_pose_distance – Keypoint pose distance for UDP. Paper ref: Huang et al. The Devil is in the Details: Delving into Unbiased Data Processing for Human Pose Estimation (CVPR 2020).
target_type (str) – supported targets: ‘GaussianHeatmap’, ‘CombinedTarget’. Default:’GaussianHeatmap’ CombinedTarget: The combination of classification target (response map) and regression target (offset map). Paper ref: Huang et al. The Devil is in the Details: Delving into Unbiased Data Processing for Human Pose Estimation (CVPR 2020).
- class easycv.datasets.pose.pipelines.transforms.TopDownGenerateTargetRegression[source]¶
Bases:
object
Generate the target regression vector (coordinates).
Required keys: ‘joints_3d’, ‘joints_3d_visible’, ‘ann_info’. Modified keys: ‘target’, and ‘target_weight’.
- class easycv.datasets.pose.pipelines.transforms.TopDownRandomTranslation(trans_factor=0.15, trans_prob=1.0)[source]¶
Bases:
object
Data augmentation with random translation.
Required key: ‘scale’ and ‘center’. Modifies key: ‘center’.
Notes
bbox height: H bbox width: W
- Parameters
trans_factor (float) – Translating center to
[-trans_factor, trans_factor] * [W, H] + center
.trans_prob (float) – Probability of random translation.
Submodules¶
easycv.datasets.pose.top_down module¶
- class easycv.datasets.pose.top_down.PoseTopDownDataset(data_source, pipeline, profiling=False)[source]¶
Bases:
Generic
[torch.utils.data.dataset.T_co
]PoseTopDownDataset dataset for top-down pose estimation. The dataset loads raw features and apply specified transforms to return a dict containing the image tensors and other information.
- Parameters
data_source – Data_source config dict
pipeline – Pipeline config list
profiling – If set True, will print pipeline time
easycv.datasets.selfsup package¶
Subpackages¶
easycv.datasets.selfsup.data_sources package¶
- class easycv.datasets.selfsup.data_sources.SSLSourceImageList(list_file, root='', max_try=20)[source]¶
Bases:
object
datasource for classification
- Parameters
list_file – str / list(str), str means a input image list file path, this file contains records as image_path label in list_file list(str) means multi image list, each one contains some records as image_path label
root – str / list(str), root path for image_path, each list_file will need a root, if len(root) < len(list_file), we will use root[-1] to fill root list.
max_try – int, max try numbers of reading image
- class easycv.datasets.selfsup.data_sources.SSLSourceImageNetFeature(root_path, training=True, data_keyword='feat1', label_keyword='label', dynamic_load=True)[source]¶
Bases:
object
- class easycv.datasets.selfsup.data_sources.image_list.SSLSourceImageList(list_file, root='', max_try=20)[source]¶
Bases:
object
datasource for classification
- Parameters
list_file – str / list(str), str means a input image list file path, this file contains records as image_path label in list_file list(str) means multi image list, each one contains some records as image_path label
root – str / list(str), root path for image_path, each list_file will need a root, if len(root) < len(list_file), we will use root[-1] to fill root list.
max_try – int, max try numbers of reading image
- class easycv.datasets.selfsup.data_sources.imagenet_feature.SSLSourceImageNetFeature(root_path, training=True, data_keyword='feat1', label_keyword='label', dynamic_load=True)[source]¶
Bases:
object
easycv.datasets.selfsup.pipelines package¶
- class easycv.datasets.selfsup.pipelines.RandomAppliedTrans(transforms, p=0.5)[source]¶
Bases:
object
Randomly applied transformations. :param transforms: List of transformations in dictionaries. :type transforms: List[Dict]
- class easycv.datasets.selfsup.pipelines.Lighting[source]¶
Bases:
object
Lighting noise(AlexNet - style PCA - based noise)
- class easycv.datasets.selfsup.pipelines.transforms.MAEFtAugment(input_size=None, color_jitter=None, auto_augment=None, interpolation=None, re_prob=None, re_mode=None, re_count=None, mean=None, std=None, is_train=True)[source]¶
Bases:
object
RandAugment data augmentation method based on “RandAugment: Practical automated data augmentation with a reduced search space”. This code is borrowed from <https://github.com/pengzhiliang/MAE-pytorch> :param input_size: images input size :type input_size: int :param color_jitter: Color jitter factor :type color_jitter: float :param auto_augment: Use AutoAugment policy :param iterpolation: Training interpolation :param re_prob: Random erase prob :param re_mode: Random erase mode :param re_count: Random erase count :param mean: mean used for normalization :param std: std used for normalization :param is_train: If True use all augmentation strategy
- class easycv.datasets.selfsup.pipelines.transforms.RandomAppliedTrans(transforms, p=0.5)[source]¶
Bases:
object
Randomly applied transformations. :param transforms: List of transformations in dictionaries. :type transforms: List[Dict]
easycv.datasets.utils package¶
Submodules¶
easycv.datasets.utils.tfrecord_util module¶
- easycv.datasets.utils.tfrecord_util.download_tfrecord(file_list_or_path, target_path, slice_count=1, slice_id=0, force=False)[source]¶
Download data from oss. Use the processes on the gpus to slice download, each gpu process downloads part of the data. The number of slices is the same as the number of gpu processes. Support tfrecord of ImageNet style. tfrecord_dir
- Parameters
file_list_or_path – A list of absolute data path or a path str type(file_list) == list means this is the list type(file_list) == str means open(file_list).readlines()
target_path – A str, download path
slice_count – Download worker num
slice_id – Download worker ID
force – If false, skip download if the file already exists in the target path. If true, recopy and replace the original file.
- Returns
list of str, download tfrecord path index_path: list of str, download tfrecord idx path
- Return type
path
Submodules¶
easycv.datasets.builder module¶
easycv.datasets.registry module¶
easycv.hooks package¶
Submodules¶
easycv.hooks.best_ckpt_saver_hook module¶
- class easycv.hooks.best_ckpt_saver_hook.BestCkptSaverHook(by_epoch=True, save_optimizer=True, best_metric_name=[], best_metric_type=[], **kwargs)[source]¶
Bases:
mmcv.runner.hooks.hook.Hook
Save checkpoints periodically.
- Parameters
by_epoch (bool) – Saving checkpoints by epoch or by iteration. Default: True.
save_optimizer (bool) – Whether to save optimizer state_dict in the checkpoint. It is usually used for resuming experiments. Default: True.
best_metric_name (List(str)) – metric name to save best, such as “neck_top1”… Default: [], do not save anything
best_metric_type (List(str)) – metric type to define best, should be “max”, “min” if len(best_metric_type) <= len(best_metric_type), use “max” to append.
easycv.hooks.byol_hook module¶
- class easycv.hooks.byol_hook.BYOLHook(end_momentum=1.0, **kwargs)[source]¶
Bases:
mmcv.runner.hooks.hook.Hook
Hook in BYOL
- This hook including momentum adjustment in BYOL following:
m = 1 - ( 1- m_0) * (cos(pi * k / K) + 1) / 2, k: current step, K: total steps.
easycv.hooks.dino_hook module¶
- easycv.hooks.dino_hook.cosine_scheduler(base_value, final_value, epochs, niter_per_ep, warmup_epochs=0, start_warmup_value=0)[source]¶
- class easycv.hooks.dino_hook.DINOHook(momentum_teacher=0.996, weight_decay=0.04, weight_decay_end=0.4, **kwargs)[source]¶
Bases:
mmcv.runner.hooks.hook.Hook
Hook in DINO
easycv.hooks.ema_hook module¶
- class easycv.hooks.ema_hook.ModelEMA(model, decay=0.9999, updates=0)[source]¶
Bases:
object
Model Exponential Moving Average from https://github.com/rwightman/pytorch-image-models Keep a moving average of everything in the model state_dict (parameters and buffers). This is intended to allow functionality like https://www.tensorflow.org/api_docs/python/tf/train/ExponentialMovingAverage A smoothed version of the weights is necessary for some training schemes to perform well. This class is sensitive where it is initialized in the sequence of model init, GPU assignment and distributed training wrappers.
In Yolo5s, ema help increase mAP from 0.27 to 0.353
- class easycv.hooks.ema_hook.EMAHook(decay=0.9999, copy_model_attr=())[source]¶
Bases:
mmcv.runner.hooks.hook.Hook
Hook to carry out Exponential Moving Average
easycv.hooks.eval_hook module¶
- class easycv.hooks.eval_hook.EvalHook(dataloader, initial=False, interval=1, mode='test', flush_buffer=True, **eval_kwargs)[source]¶
Bases:
mmcv.runner.hooks.hook.Hook
Evaluation hook.
- dataloader¶
A PyTorch dataloader.
- Type
DataLoader
- interval¶
Evaluation interval (by epochs). Default: 1.
- Type
int
- mode¶
model forward mode
- Type
str
- flush_buffer¶
flush log buffer
- Type
bool
- class easycv.hooks.eval_hook.DistEvalHook(dataloader, interval=1, mode='test', initial=False, gpu_collect=False, flush_buffer=True, **eval_kwargs)[source]¶
Bases:
easycv.hooks.eval_hook.EvalHook
Distributed evaluation hook.
- dataloader¶
A PyTorch dataloader.
- Type
DataLoader
- interval¶
Evaluation interval (by epochs). Default: 1.
- Type
int
- mode¶
model forward mode
- Type
str
- tmpdir¶
Temporary directory to save the results of all processes. Default: None.
- Type
str | None
- gpu_collect¶
Whether to use gpu or cpu to collect results. Default: False.
- Type
bool
easycv.hooks.export_hook module¶
- class easycv.hooks.export_hook.ExportHook(cfg, ckpt_filename_tmpl='epoch_{}.pth', export_ckpt_filename_tmpl='epoch_{}_export.pt', export_after_each_ckpt=False)[source]¶
Bases:
mmcv.runner.hooks.hook.Hook
export model when training on pai
easycv.hooks.extractor module¶
easycv.hooks.optimizer_hook module¶
- class easycv.hooks.optimizer_hook.OptimizerHook(update_interval=1, grad_clip=None, coalesce=True, bucket_size_mb=- 1, ignore_key=[], ignore_key_epoch=[], multiply_key=[], multiply_rate=[])[source]¶
Bases:
mmcv.runner.hooks.optimizer.OptimizerHook
- __init__(update_interval=1, grad_clip=None, coalesce=True, bucket_size_mb=- 1, ignore_key=[], ignore_key_epoch=[], multiply_key=[], multiply_rate=[])[source]¶
ignore_key: [str,…], ignore_key[i], name of parameters, which’s gradient will be set to zero before every optimizer step when epoch < ignore_key_epoch[i] ignore_key_epoch: [int,…], epoch < ignore_key_epoch[i], ignore_key[i]’s gradient will be set to zero.
multiply_key:[str,…] multiply_key[i], name of parameters, which will set different learning rate ratio by multipy_rate multiply_rate:[float,…] multiply_rate[i], different ratio
- class easycv.hooks.optimizer_hook.AMPFP16OptimizerHook(update_interval=1, grad_clip=None, coalesce=True, bucket_size_mb=- 1, ignore_key=[], ignore_key_epoch=[])[source]¶
Bases:
easycv.hooks.optimizer_hook.OptimizerHook
- __init__(update_interval=1, grad_clip=None, coalesce=True, bucket_size_mb=- 1, ignore_key=[], ignore_key_epoch=[])[source]¶
ignore_key: [str,…], ignore_key[i], name of parameters, which’s gradient will be set to zero before every optimizer step when epoch < ignore_key_epoch[i] ignore_key_epoch: [int,…], epoch < ignore_key_epoch[i], ignore_key[i]’s gradient will be set to zero.
easycv.hooks.oss_sync_hook module¶
- class easycv.hooks.oss_sync_hook.OSSSyncHook(work_dir, oss_work_dir, interval=1, ckpt_filename_tmpl='epoch_{}.pth', export_ckpt_filename_tmpl='epoch_{}_export.pt', other_file_list=[], iter_interval=None)[source]¶
Bases:
mmcv.runner.hooks.hook.Hook
upload log files and checkpoints to oss when training on pai
- __init__(work_dir, oss_work_dir, interval=1, ckpt_filename_tmpl='epoch_{}.pth', export_ckpt_filename_tmpl='epoch_{}_export.pt', other_file_list=[], iter_interval=None)[source]¶
- Parameters
work_dir – work_dir in cfg
oss_work_dir – oss directory where to upload local files in work_dir
interval – upload frequency
ckpt_filename_tmpl – checkpoint filename template
other_file_list – other file need to be upload to oss
iter_interval – upload frequency by iter interval, default to be None, means do it with certain assignment
easycv.hooks.registry module¶
easycv.hooks.show_time_hook module¶
easycv.hooks.swav_hook module¶
easycv.hooks.sync_norm_hook module¶
- class easycv.hooks.sync_norm_hook.SyncNormHook(no_aug_epochs=15, interval=1, **kwargs)[source]¶
Bases:
mmcv.runner.hooks.hook.Hook
Synchronize Norm states after training epoch, currently used in YOLOX.
- Parameters
no_aug_epochs (int) – The number of latter epochs in the end of the training to switch to synchronizing norm interval. Default: 15.
interval (int) – Synchronizing norm interval. Default: 1.
easycv.hooks.sync_random_size_hook module¶
- class easycv.hooks.sync_random_size_hook.SyncRandomSizeHook(ratio_range=(14, 26), img_scale=(640, 640), interval=10, device='cuda', **kwargs)[source]¶
Bases:
mmcv.runner.hooks.hook.Hook
Change and synchronize the random image size across ranks, currently used in YOLOX.
- Parameters
ratio_range (tuple[int]) – Random ratio range. It will be multiplied by 32, and then change the dataset output image size. Default: (14, 26).
img_scale (tuple[int]) – Size of input image. Default: (640, 640).
interval (int) – The interval of change image size. Default: 10.
device (torch.device | str) – device for returned tensors. Default: ‘cuda’.
easycv.hooks.tensorboard module¶
- class easycv.hooks.tensorboard.TensorboardLoggerHookV2(log_dir=None, interval=10, ignore_last=True, reset_flag=False, by_epoch=True)[source]¶
Bases:
mmcv.runner.hooks.logger.tensorboard.TensorboardLoggerHook
easycv.hooks.wandb module¶
- class easycv.hooks.wandb.WandbLoggerHookV2(init_kwargs=None, interval=10, ignore_last=True, reset_flag=False, commit=True, by_epoch=True, with_step=True)[source]¶
Bases:
mmcv.runner.hooks.logger.wandb.WandbLoggerHook
easycv.hooks.yolox_lr_hook module¶
- class easycv.hooks.yolox_lr_hook.YOLOXLrUpdaterHook(num_last_epochs, **kwargs)[source]¶
Bases:
mmcv.runner.hooks.lr_updater.CosineAnnealingLrUpdaterHook
YOLOX learning rate scheme.
There are two main differences between YOLOXLrUpdaterHook and CosineAnnealingLrUpdaterHook.
- When the current running epoch is greater than
max_epoch-last_epoch, a fixed learning rate will be used
The exp warmup scheme is different with LrUpdaterHook in MMCV
- Parameters
num_last_epochs (int) – The number of epochs with a fixed learning rate before the end of the training.
easycv.hooks.yolox_mode_switch_hook module¶
- class easycv.hooks.yolox_mode_switch_hook.YOLOXModeSwitchHook(no_aug_epochs=15, skip_type_keys=('MMMosaic', 'MMRandomAffine', 'MMMixUp'), **kwargs)[source]¶
Bases:
mmcv.runner.hooks.hook.Hook
Switch the mode of YOLOX during training.
This hook turns off the mosaic and mixup data augmentation and switches to use L1 loss in bbox_head.
- Parameters
no_aug_epochs – The number of latter epochs in the end of the training to close the data augmentation and switch to L1 loss. Default: 15.
easycv.predictors package¶
Submodules¶
easycv.predictors.base module¶
easycv.predictors.classifier module¶
- class easycv.predictors.classifier.TorchClassifier(model_path, model_config=None, topk=1, label_map_path=None)[source]¶
Bases:
easycv.predictors.interface.PredictorInterface
- __init__(model_path, model_config=None, topk=1, label_map_path=None)[source]¶
init model
- Parameters
model_path – model file path
model_config – config string for model to init, in json format
- get_output_type()[source]¶
in this function user should return a type dict, which indicates which type of data should the output of predictor be converted to * type json, data will be serialized to json str
type image, data will be converted to encode image binary and write to oss file, whose name is output_dir/${key}/${input_filename}_${idx}.jpg, where input_filename is the base filename extracted from url, key corresponds to the key in the dict of output_type, if the type of data indexed by key is a list, idx is the index of element in list, otherwhile ${idx} will be empty
type video, data will be converted to encode video binary and write to oss file,
- :: return {
‘image’: ‘image’, ‘feature’: ‘json’
}
indicating that the image data in the output dict will be save to image file and feature in output dict will be converted to json
- predict(input_data_list, batch_size=- 1)[source]¶
using session run predict a number of samples using batch_size
- Parameters
input_data_list – a list of numpy array, each array is a sample to be predicted
batch_size – batch_size passed by the caller, you can also ignore this param and use a fixed number if you do not want to adjust batch_size in runtime
- Returns
- a list of dict, each dict is the prediction result of one sample
eg, {“output1”: value1, “output2”: value2}, the value type can be python int str float, and numpy array
- Return type
result
easycv.predictors.detector module¶
- class easycv.predictors.detector.TorchYoloXPredictor(model_path, max_det=100, score_thresh=0.5, model_config=None)[source]¶
Bases:
easycv.predictors.interface.PredictorInterface
- __init__(model_path, max_det=100, score_thresh=0.5, model_config=None)[source]¶
init model
- Parameters
model_path – model file path
max_det – maximum number of detection
score_thresh – score_thresh to filter box
model_config – config string for model to init, in json format
- predict(input_data_list, batch_size=- 1)[source]¶
using session run predict a number of samples using batch_size
- Parameters
input_data_list – a list of numpy array(in rgb order), each array is a sample to be predicted
batch_size – batch_size passed by the caller, you can also ignore this param and use a fixed number if you do not want to adjust batch_size in runtime
- Returns
- a list of dict, each dict is the prediction result of one sample
eg, {“output1”: value1, “output2”: value2}, the value type can be python int str float, and numpy array
- Return type
result
- class easycv.predictors.detector.TorchFaceDetector(model_path=None, model_config=None)[source]¶
Bases:
easycv.predictors.interface.PredictorInterface
- __init__(model_path=None, model_config=None)[source]¶
init model, add a facedetect and align for img input.
- Parameters
model_path – model file path
model_config – config string for model to init, in json format
- get_output_type()[source]¶
in this function user should return a type dict, which indicates which type of data should the output of predictor be converted to * type json, data will be serialized to json str
type image, data will be converted to encode image binary and write to oss file, whose name is output_dir/${key}/${input_filename}_${idx}.jpg, where input_filename is the base filename extracted from url, key corresponds to the key in the dict of output_type, if the type of data indexed by key is a list, idx is the index of element in list, otherwhile ${idx} will be empty
type video, data will be converted to encode video binary and write to oss file,
- :: return {
‘image’: ‘image’, ‘feature’: ‘json’
} indicating that the image data in the output dict will be save to image file and feature in output dict will be converted to json
- predict(input_data_list, batch_size=- 1, threshold=0.95)[source]¶
using session run predict a number of samples using batch_size
- Parameters
input_data_list – a list of numpy array, each array is a sample to be predicted
batch_size – batch_size passed by the caller, you can also ignore this param and use a fixed number if you do not want to adjust batch_size in runtime
- Returns
- a list of dict, each dict is the prediction result of one sample
eg, {“output1”: value1, “output2”: value2}, the value type can be python int str float, and numpy array
- Return type
result
- Raises
if detect !=1 face in a img, then do nothing for this image –
- class easycv.predictors.detector.TorchYoloXClassifierPredictor(models_root_dir, max_det=100, cls_score_thresh=0.01, det_model_config=None, cls_model_config=None)[source]¶
Bases:
easycv.predictors.interface.PredictorInterface
- __init__(models_root_dir, max_det=100, cls_score_thresh=0.01, det_model_config=None, cls_model_config=None)[source]¶
init model, add a yolox and classification predictor for img input.
- Parameters
models_root_dir – models_root_dir/detection/.pth and models_root_dir/classification/.pth
det_model_config – config string for detection model to init, in json format
cls_model_config – config string for classification model to init, in json format
- predict(input_data_list, batch_size=- 1)[source]¶
using session run predict a number of samples using batch_size
- Parameters
input_data_list – a list of numpy array(in rgb order), each array is a sample to be predicted
batch_size – batch_size passed by the caller, you can also ignore this param and use a fixed number if you do not want to adjust batch_size in runtime
- Returns
- a list of dict, each dict is the prediction result of one sample
eg, {“output1”: value1, “output2”: value2}, the value type can be python int str float, and numpy array
- Return type
result
easycv.predictors.feature_extractor module¶
- class easycv.predictors.feature_extractor.TorchFeatureExtractor(model_path, model_config=None)[source]¶
Bases:
easycv.predictors.interface.PredictorInterface
- __init__(model_path, model_config=None)[source]¶
init model
- Parameters
model_path – model file path
model_config – config string for model to init, in json format
- get_output_type()[source]¶
in this function user should return a type dict, which indicates which type of data should the output of predictor be converted to * type json, data will be serialized to json str
type image, data will be converted to encode image binary and write to oss file, whose name is output_dir/${key}/${input_filename}_${idx}.jpg, where input_filename is the base filename extracted from url, key corresponds to the key in the dict of output_type, if the type of data indexed by key is a list, idx is the index of element in list, otherwhile ${idx} will be empty
type video, data will be converted to encode video binary and write to oss file,
- :: return {
‘image’: ‘image’, ‘feature’: ‘json’
} indicating that the image data in the output dict will be save to image file and feature in output dict will be converted to json
- predict(input_data_list, batch_size=- 1)[source]¶
using session run predict a number of samples using batch_size
- Parameters
input_data_list – a list of numpy array, each array is a sample to be predicted
batch_size – batch_size passed by the caller, you can also ignore this param and use a fixed number if you do not want to adjust batch_size in runtime
- Returns
- a list of dict, each dict is the prediction result of one sample
eg, {“output1”: value1, “output2”: value2}, the value type can be python int str float, and numpy array
- Return type
result
- class easycv.predictors.feature_extractor.TorchFaceFeatureExtractor(model_path, model_config=None)[source]¶
Bases:
easycv.predictors.interface.PredictorInterface
- __init__(model_path, model_config=None)[source]¶
init model, add a facedetect and align for img input.
- Parameters
model_path – model file path
model_config – config string for model to init, in json format
- get_output_type()[source]¶
in this function user should return a type dict, which indicates which type of data should the output of predictor be converted to * type json, data will be serialized to json str
type image, data will be converted to encode image binary and write to oss file, whose name is output_dir/${key}/${input_filename}_${idx}.jpg, where input_filename is the base filename extracted from url, key corresponds to the key in the dict of output_type, if the type of data indexed by key is a list, idx is the index of element in list, otherwhile ${idx} will be empty
type video, data will be converted to encode video binary and write to oss file,
- :: return {
‘image’: ‘image’, ‘feature’: ‘json’
} indicating that the image data in the output dict will be save to image file and feature in output dict will be converted to json
- predict(input_data_list, batch_size=- 1, detect_and_align=True)[source]¶
using session run predict a number of samples using batch_size
- Parameters
input_data_list – a list of numpy array or PIL.Image, each array is a sample to be predicted
batch_size – batch_size passed by the caller, you can also ignore this param and use a fixed number if you do not want to adjust batch_size in runtime
detect_and_align – True to detect and align before feature extractor
- Returns
- a list of dict, each dict is the prediction result of one sample
eg, {“output1”: value1, “output2”: value2}, the value type can be python int str float, and numpy array
- Return type
result
- Raises
if detect !=1 face in a img, then do nothing for this image –
- class easycv.predictors.feature_extractor.TorchMultiFaceFeatureExtractor(model_path, model_config=None)[source]¶
Bases:
easycv.predictors.interface.PredictorInterface
- __init__(model_path, model_config=None)[source]¶
init model, add a facedetect and align for img input.
- Parameters
model_path – model file path
model_config – config string for model to init, in json format
- get_output_type()[source]¶
in this function user should return a type dict, which indicates which type of data should the output of predictor be converted to * type json, data will be serialized to json str
type image, data will be converted to encode image binary and write to oss file, whose name is output_dir/${key}/${input_filename}_${idx}.jpg, where input_filename is the base filename extracted from url, key corresponds to the key in the dict of output_type, if the type of data indexed by key is a list, idx is the index of element in list, otherwhile ${idx} will be empty
type video, data will be converted to encode video binary and write to oss file,
- :: return {
‘image’: ‘image’, ‘feature’: ‘json’
} indicating that the image data in the output dict will be save to image file and feature in output dict will be converted to json
- predict(input_data_list, batch_size=- 1, detect_and_align=True)[source]¶
using session run predict a number of samples using batch_size
- Parameters
input_data_list – a list of numpy array or PIL.Image, each array is a sample to be predicted
batch_size – batch_size passed by the caller, you can also ignore this param and use a fixed number if you do not want to adjust batch_size in runtime
detect_and_align – True to detect and align before feature extractor
- Returns
- a list of dict, each dict is the prediction result of one sample
eg, {“output1”: value1, “output2”: value2}, the value type can be python int str float, and numpy array
- Return type
result
- Raises
if detect !=1 face in a img, then do nothing for this image –
- class easycv.predictors.feature_extractor.TorchFaceAttrExtractor(model_path, model_config=None, face_threshold=0.95, attr_method=['distribute_sum', 'softmax', 'softmax'], attr_name=['age', 'gender', 'emo'])[source]¶
Bases:
easycv.predictors.interface.PredictorInterface
- __init__(model_path, model_config=None, face_threshold=0.95, attr_method=['distribute_sum', 'softmax', 'softmax'], attr_name=['age', 'gender', 'emo'])[source]¶
init model
- Parameters
model_path – model file path
model_config – config string for model to init, in json format
attr_method –
softmax: do softmax for feature_dim 1
distribute_sum: do softmax and prob sum
- get_output_type()[source]¶
in this function user should return a type dict, which indicates which type of data should the output of predictor be converted to * type json, data will be serialized to json str
type image, data will be converted to encode image binary and write to oss file, whose name is output_dir/${key}/${input_filename}_${idx}.jpg, where input_filename is the base filename extracted from url, key corresponds to the key in the dict of output_type, if the type of data indexed by key is a list, idx is the index of element in list, otherwhile ${idx} will be empty
type video, data will be converted to encode video binary and write to oss file,
- :: return {
‘image’: ‘image’, ‘feature’: ‘json’
} indicating that the image data in the output dict will be save to image file and feature in output dict will be converted to json
- predict(input_data_list, batch_size=- 1)[source]¶
using session run predict a number of samples using batch_size
- Parameters
input_data_list – a list of numpy array, each array is a sample to be predicted
batch_size – batch_size passed by the caller, you can also ignore this param and use a fixed number if you do not want to adjust batch_size in runtime
- Returns
- a list of dict, each dict is the prediction result of one sample
eg, {“output1”: value1, “output2”: value2}, the value type can be python int str float, and numpy array
- Return type
result
easycv.predictors.interface module¶
- class easycv.predictors.interface.PredictorInterface(model_path, model_config=None)[source]¶
Bases:
object
- version = 1¶
- __init__(model_path, model_config=None)[source]¶
init model
- Parameters
model_path – init model from this directory
model_config – config string for model to init, in json format
- abstract predict(input_data, batch_size)[source]¶
using session run predict a number of samples using batch_size
- Parameters
input_data – a list of numpy array, each array is a sample to be predicted
batch_size – batch_size passed by the caller, you can also ignore this param and use a fixed number if you do not want to adjust batch_size in runtime
- Returns
- a list of dict, each dict is the prediction result of one sample
eg, {“output1”: value1, “output2”: value2}, the value type can be python int str float, and numpy array
- Return type
result
- get_output_type()[source]¶
in this function user should return a type dict, which indicates which type of data should the output of predictor be converted to * type json, data will be serialized to json str
type image, data will be converted to encode image binary and write to oss file, whose name is output_dir/${key}/${input_filename}_${idx}.jpg, where input_filename is extracted from url, key corresponds to the key in the dict of output_type, if the type of data indexed by key is a list, idx is the index of element in list, otherwhile ${idx} will be empty
type video, data will be converted to encode video binary and write to oss file,
- :: return {
‘image’: ‘image’, ‘feature’: ‘json’
} indicating that the image data in the output dict will be save to image file and feature in output dict will be converted to json
- class easycv.predictors.interface.PredictorInterfaceV2(model_path, model_config=None)[source]¶
Bases:
easycv.predictors.interface.PredictorInterface
- version = 2¶
- __init__(model_path, model_config=None)[source]¶
init model
- Parameters
model_path – init model from this directory
model_config – config string for model to init, in json format
- get_output_type()[source]¶
in this function user should return a type dict, which indicates which type of data should the output of predictor be converted to * type json, data will be serialized to json str
type image, data will be converted to encode image binary and write to oss file, whose name is output_dir/${key}/${input_filename}_${idx}.jpg, where input_filename is the base filename extracted from url, key corresponds to the key in the dict of output_type, if the type of data indexed by key is a list, idx is the index of element in list, otherwhile ${idx} will be empty
type video, data will be converted to encode video binary and write to oss file,
- :: return {
‘image’: ‘image’, ‘feature’: ‘json’
} indicating that the image data in the output dict will be save to image file and feature in output dict will be converted to json
- abstract predict(input_data_dict_list, batch_size)[source]¶
using session run predict a number of samples using batch_size
- Parameters
input_data_dict_list – a list of dict, each dict is a sample data to be predicted
batch_size – batch_size passed by the caller, you can also ignore this param and use a fixed number if you do not want to adjust batch_size in runtime
- Returns
- a list of dict, each dict is the prediction result of one sample
eg, {“output1”: value1, “output2”: value2}, the value type can be python int str float, and numpy array
- Return type
result
easycv.predictors.pose_predictor module¶
- class easycv.predictors.pose_predictor.LoadImage(color_type='color', channel_order='rgb')[source]¶
Bases:
object
A simple pipeline to load image.
- class easycv.predictors.pose_predictor.OutputHook(module, outputs=None, as_tensor=False)[source]¶
Bases:
object
- class easycv.predictors.pose_predictor.TorchPoseTopDownPredictor(model_path, model_config=None)[source]¶
Bases:
easycv.predictors.interface.PredictorInterface
Inference a single image with a list of bounding boxes.
- __init__(model_path, model_config=None)[source]¶
init model
- Parameters
model_path – model file path
model_config – config string for model to init, in json format
- predict(input_data_list, batch_size=- 1, return_heatmap=False)[source]¶
Inference pose.
- Parameters
input_data_list –
A list of image infos, like: [
- {
- ‘img’ (str | np.ndarray, RGB):
Image filename or loaded image.
- ’detection_results’(list | np.ndarray):
All bounding boxes (with scores), shaped (N, 4) or (N, 5). (left, top, width, height, [score]) where N is number of bounding boxes.
]
batch_size – batch size
return_heatmap – return heatmap value or not, default false.
- Returns
- {
‘pose_results’: list of ndarray[NxKx3]: Predicted pose x, y, score ‘pose_heatmap’ (optional): list of heatmap[N, K, H, W]: Model output heatmap
}
- class easycv.predictors.pose_predictor.TorchPoseTopDownPredictorWithDetector(model_path, model_config={'detection': {'model_type': None, 'reserved_classes': [], 'score_thresh': 0.0}, 'pose': {'bbox_thr': 0.3, 'format': 'xywh'}})[source]¶
Bases:
easycv.predictors.interface.PredictorInterface
- SUPPORT_DETECTION_PREDICTORS = {'TorchYoloXPredictor': <class 'easycv.predictors.detector.TorchYoloXPredictor'>}¶
- __init__(model_path, model_config={'detection': {'model_type': None, 'reserved_classes': [], 'score_thresh': 0.0}, 'pose': {'bbox_thr': 0.3, 'format': 'xywh'}})[source]¶
init model
- Parameters
model_path – pose and detection model file path, split with ,, make sure the first is pose model, second is detection model
model_config – config string for model to init, in json format
- predict(input_data_list, batch_size=- 1, return_heatmap=False)[source]¶
Inference with pose model and detection model.
- Parameters
input_data_list – A list of images(np.ndarray, RGB)
batch_size – batch size
return_heatmap – return heatmap value or not, default false.
- Returns
- {
‘pose_results’: list of ndarray[NxKx3]: Predicted pose x, y, score ‘pose_heatmap’ (optional): list of heatmap[N, K, H, W]: Model output heatmap
}
- easycv.predictors.pose_predictor.vis_pose_result(model, img, result, radius=4, thickness=1, kpt_score_thr=0.3, bbox_color='green', dataset_info=None, show=False, out_file=None)[source]¶
Visualize the detection results on the image.
- Parameters
model (nn.Module) – The loaded detector.
img (str | np.ndarray) – Image filename or loaded image.
result (list[dict]) – The results to draw over img (bbox_result, pose_result).
radius (int) – Radius of circles.
thickness (int) – Thickness of lines.
kpt_score_thr (float) – The threshold to visualize the keypoints.
skeleton (list[tuple()]) – Default None.
show (bool) – Whether to show the image. Default True.
out_file (str|None) – The filename of the output visualization image.
easycv.core package¶
Subpackages¶
easycv.core.evaluation package¶
Subpackages¶
easycv.core.evaluation.custom_cocotools package¶
- class easycv.core.evaluation.custom_cocotools.cocoeval.COCOeval(cocoGt=None, cocoDt=None, iouType='segm', sigmas=None)[source]¶
Bases:
object
- __init__(cocoGt=None, cocoDt=None, iouType='segm', sigmas=None)[source]¶
Initialize CocoEval using coco APIs for gt and dt :param cocoGt: coco object with ground truth annotations :param cocoDt: coco object with detection results :param iouType: type of iou to be computed, bbox for detection task,
segm for segmentation task
- Parameters
sigmas – keypoint labelling sigmas.
- Returns
None
- evaluate()[source]¶
Run per image evaluation on given images and store results (a list of dict) in self.evalImgs :returns: None
- evaluateImg(imgId, catId, aRng, maxDet)[source]¶
perform evaluation for single category and image :param imgId: image id, string :param catId: category id, string :param aRng: area range, tuple :param maxDet: maximum detection number
- Returns
dict (single image results)
- accumulate(p=None)[source]¶
Accumulate per image evaluation results and store the result in self.eval :param param p: input params for evaluation
- Returns
None
- summarize()[source]¶
Compute and display summary metrics for evaluation results. Note this functin can only be applied on the default parameter setting
Submodules¶
easycv.core.evaluation.ap module¶
- easycv.core.evaluation.ap.ap_per_class(tp, conf, pred_cls, target_cls, plot=False, fname='precision-recall_curve.png')[source]¶
Compute the average precision, given the recall and precision curves. Source: https://github.com/rafaelpadilla/Object-Detection-Metrics.
- Parameters
tp – True positives (nparray, nx1 or nx10).
conf – Objectness value from 0-1 (nparray).
pred_cls – Predicted object classes (nparray).
target_cls – True object classes (nparray).
plot – Plot precision-recall curve at mAP@0.5
fname – Plot filename
- Returns
The average precision as computed in py-faster-rcnn.
- easycv.core.evaluation.ap.compute_ap(recall, precision)[source]¶
Compute the average precision, given the recall and precision curves. Source: https://github.com/rbgirshick/py-faster-rcnn.
- Parameters
recall – The recall curve (list).
precision – The precision curve (list).
- Returns
The average precision as computed in py-faster-rcnn.
easycv.core.evaluation.auc_eval module¶
- class easycv.core.evaluation.auc_eval.AucEvaluator(dataset_name=None, metric_names=['neck_auc'], neck_num=None)[source]¶
Bases:
easycv.core.evaluation.base_evaluator.Evaluator
AUC evaluator for binary classification only.
easycv.core.evaluation.base_evaluator module¶
- class easycv.core.evaluation.base_evaluator.Evaluator(dataset_name=None, metric_names=[])[source]¶
Bases:
object
Evaluator interface
- __init__(dataset_name=None, metric_names=[])[source]¶
Construct eval ops from tensor
- Parameters
dataset_name (str) – dataset name to be evaluated
metric_names (List[str]) – metric names this evaluator will return
- property metric_names¶
easycv.core.evaluation.builder module¶
easycv.core.evaluation.classification_eval module¶
- class easycv.core.evaluation.classification_eval.ClsEvaluator(topk=(1, 5), dataset_name=None, metric_names=['neck_top1'], neck_num=None)[source]¶
Bases:
easycv.core.evaluation.base_evaluator.Evaluator
Classification evaluator.
- __init__(topk=(1, 5), dataset_name=None, metric_names=['neck_top1'], neck_num=None)[source]¶
- Parameters
top_k – tuple of int, evaluate top_k acc
dataset_name – eval dataset name
metric_names – eval metrics name
neck_num – some model contains multi-neck to support multitask, neck_num means use the no.neck_num neck output of model to eval
easycv.core.evaluation.coco_evaluation module¶
Class for evaluating object detections with COCO metrics.
- class easycv.core.evaluation.coco_evaluation.CocoDetectionEvaluator(classes, include_metrics_per_category=False, all_metrics_per_category=False, coco_analyze=False, dataset_name=None, metric_names=['DetectionBoxes_Precision/mAP'])[source]¶
Bases:
easycv.core.evaluation.base_evaluator.Evaluator
Class to evaluate COCO detection metrics.
- __init__(classes, include_metrics_per_category=False, all_metrics_per_category=False, coco_analyze=False, dataset_name=None, metric_names=['DetectionBoxes_Precision/mAP'])[source]¶
Constructor.
- Parameters
classes – a list of class name
include_metrics_per_category – If True, include metrics for each category.
all_metrics_per_category – Whether to include all the summary metrics for each category in per_category_ap. Be careful with setting it to true if you have more than handful of ∏, because it will pollute your mldash.
coco_analyze – If True, will analyze the detection result using coco analysis.
dataset_name – If not None, dataset_name will be inserted to each metric name.
- add_single_ground_truth_image_info(image_id, groundtruth_dict)[source]¶
Adds groundtruth for a single image to be used for evaluation.
If the image has already been added, a warning is logged, and groundtruth is ignored.
- Parameters
image_id – A unique string/integer identifier for the image.
groundtruth_dict –
A dictionary containing
- InputDataFields.groundtruth_boxes
float32 numpy array of shape [num_boxes, 4] containing num_boxes groundtruth boxes of the format [ymin, xmin, ymax, xmax] in absolute image coordinates.
- InputDataFields.groundtruth_classes
integer numpy array of shape [num_boxes] containing 1-indexed groundtruth classes for the boxes. InputDataFields.groundtruth_is_crowd (optional): integer numpy array of shape [num_boxes] containing iscrowd flag for groundtruth boxes.
- add_single_detected_image_info(image_id, detections_dict)[source]¶
Adds detections for a single image to be used for evaluation.
If a detection has already been added for this image id, a warning is logged, and the detection is skipped.
- Parameters
image_id – A unique string/integer identifier for the image.
detections_dict –
A dictionary containing
- DetectionResultFields.detection_boxes
float32 numpy array of shape [num_boxes, 4] containing num_boxes detection boxes of the format [ymin, xmin, ymax, xmax] in absolute image coordinates.
- DetectionResultFields.detection_scores
float32 numpy array of shape [num_boxes] containing detection scores for the boxes.
- DetectionResultFields.detection_classes
integer numpy array of shape [num_boxes] containing 1-indexed detection classes for the boxes.
- Raises
ValueError – If groundtruth for the image_id is not available.
- class easycv.core.evaluation.coco_evaluation.CocoMaskEvaluator(classes, include_metrics_per_category=False, dataset_name=None, metric_names=['DetectionMasks_Precision/mAP'])[source]¶
Bases:
easycv.core.evaluation.base_evaluator.Evaluator
Class to evaluate COCO detection metrics.
- __init__(classes, include_metrics_per_category=False, dataset_name=None, metric_names=['DetectionMasks_Precision/mAP'])[source]¶
Constructor.
- Parameters
categories – A list of dicts, each of which has the following keys :id: (required) an integer id uniquely identifying this category. :name: (required) string representing category name e.g., ‘cat’, ‘dog’.
include_metrics_per_category – If True, include metrics for each category.
- add_single_ground_truth_image_info(image_id, groundtruth_dict)[source]¶
Adds groundtruth for a single image to be used for evaluation.
If the image has already been added, a warning is logged, and groundtruth is ignored.
- Parameters
image_id – A unique string/integer identifier for the image.
groundtruth_dict –
A dictionary containing :InputDataFields.groundtruth_boxes: float32 numpy array of shape
[num_boxes, 4] containing num_boxes groundtruth boxes of the format [ymin, xmin, ymax, xmax] in absolute image coordinates.
- InputDataFields.groundtruth_classes
integer numpy array of shape [num_boxes] containing 1-indexed groundtruth classes for the boxes.
- InputDataFields.groundtruth_instance_masks
uint8 numpy array of shape [num_boxes, image_height, image_width] containing groundtruth masks corresponding to the boxes. The elements of the array must be in {0, 1}.
- add_single_detected_image_info(image_id, detections_dict)[source]¶
Adds detections for a single image to be used for evaluation.
If a detection has already been added for this image id, a warning is logged, and the detection is skipped.
- Parameters
image_id – A unique string/integer identifier for the image.
detections_dict – A dictionary containing - DetectionResultFields.detection_scores: float32 numpy array of shape [num_boxes] containing detection scores for the boxes. DetectionResultFields.detection_classes: integer numpy array of shape [num_boxes] containing 1-indexed detection classes for the boxes. DetectionResultFields.detection_masks: optional uint8 numpy array of shape [num_boxes, image_height, image_width] containing instance masks corresponding to the boxes. The elements of the array must be in {0, 1}.
- Raises
ValueError – If groundtruth for the image_id is not available or if spatial shapes of groundtruth_instance_masks and detection_masks are incompatible.
- class easycv.core.evaluation.coco_evaluation.CoCoPoseTopDownEvaluator(dataset_name=None, metric_names=['AP'], **kwargs)[source]¶
Bases:
easycv.core.evaluation.base_evaluator.Evaluator
Class to evaluate COCO keypoint topdown metrics.
easycv.core.evaluation.coco_tools module¶
Wrappers for third party pycocotools to be used within object_detection.
Note that nothing in this file is tensorflow related and thus cannot be called directly as a slim metric, for example.
TODO(jonathanhuang): wrap as a slim metric in metrics.py
Usage example: given a set of images with ids in the list image_ids and corresponding lists of numpy arrays encoding groundtruth (boxes and classes) and detections (boxes, scores and classes), where elements of each list correspond to detections/annotations of a single image, then evaluation (in multi-class mode) can be invoked as follows:
- groundtruth_dict = coco_tools.ExportGroundtruthToCOCO(
image_ids, groundtruth_boxes_list, groundtruth_classes_list, max_num_classes, output_path=None)
- detections_list = coco_tools.ExportDetectionsToCOCO(
image_ids, detection_boxes_list, detection_scores_list, detection_classes_list, output_path=None)
groundtruth = coco_tools.COCOWrapper(groundtruth_dict) detections = groundtruth.LoadAnnotations(detections_list) evaluator = coco_tools.COCOEvalWrapper(groundtruth, detections,
agnostic_mode=False)
metrics = evaluator.ComputeMetrics()
- class easycv.core.evaluation.coco_tools.COCOWrapper(dataset, detection_type='bbox')[source]¶
Bases:
xtcocotools.coco.COCO
Wrapper for the pycocotools COCO class.
- __init__(dataset, detection_type='bbox')[source]¶
COCOWrapper constructor.
See http://mscoco.org/dataset/#format for a description of the format. By default, the coco.COCO class constructor reads from a JSON file. This function duplicates the same behavior but loads from a dictionary, allowing us to perform evaluation without writing to external storage.
- Parameters
dataset – a dictionary holding bounding box annotations in the COCO format.
detection_type – type of detections being wrapped. Can be one of [‘bbox’, ‘segmentation’]
- Raises
ValueError – if detection_type is unsupported.
- LoadAnnotations(annotations)[source]¶
Load annotations dictionary into COCO datastructure.
See http://mscoco.org/dataset/#format for a description of the annotations format. As above, this function replicates the default behavior of the API but does not require writing to external storage.
- Parameters
annotations – python list holding object detection results where each detection is encoded as a dict with required keys [‘image_id’, ‘category_id’, ‘score’] and one of [‘bbox’, ‘segmentation’] based on detection_type.
- Returns
a coco.COCO datastructure holding object detection annotations results
- Raises
ValueError – if annotations is not a list
ValueError – if annotations do not correspond to the images contained in self.
- class easycv.core.evaluation.coco_tools.COCOEvalWrapper(groundtruth=None, detections=None, agnostic_mode=False, iou_type='bbox')[source]¶
Bases:
easycv.core.evaluation.custom_cocotools.cocoeval.COCOeval
Wrapper for the pycocotools COCOeval class.
To evaluate, create two objects (groundtruth_dict and detections_list) using the conventions listed at http://mscoco.org/dataset/#format. Then call evaluation as follows:
groundtruth = coco_tools.COCOWrapper(groundtruth_dict) detections = groundtruth.LoadAnnotations(detections_list) evaluator = coco_tools.COCOEvalWrapper(groundtruth, detections,
agnostic_mode=False)
metrics = evaluator.ComputeMetrics()
- __init__(groundtruth=None, detections=None, agnostic_mode=False, iou_type='bbox')[source]¶
COCOEvalWrapper constructor.
Note that for the area-based metrics to be meaningful, detection and groundtruth boxes must be in image coordinates measured in pixels.
- Parameters
groundtruth – a coco.COCO (or coco_tools.COCOWrapper) object holding groundtruth annotations
detections – a coco.COCO (or coco_tools.COCOWrapper) object holding detections
agnostic_mode – boolean (default: False). If True, evaluation ignores class labels, treating all detections as proposals.
iou_type – IOU type to use for evaluation. Supports bbox or segm.
- GetCategory(category_id)[source]¶
Fetches dictionary holding category information given category id.
- Parameters
category_id – integer id
- Returns
dictionary holding ‘id’, ‘name’.
- ComputeMetrics(include_metrics_per_category=False, all_metrics_per_category=False)[source]¶
Computes detection metrics.
- Parameters
include_metrics_per_category – If True, will include metrics per category.
all_metrics_per_category – If true, include all the summery metrics for each category in per_category_ap. Be careful with setting it to true if you have more than handful of categories, because it will pollute your mldash.
- Returns
- a dictionary holding:
- ’Precision/mAP’: mean average precision over classes averaged over IOU
thresholds ranging from .5 to .95 with .05 increments
’Precision/mAP@.50IOU’: mean average precision at 50% IOU ‘Precision/mAP@.75IOU’: mean average precision at 75% IOU ‘Precision/mAP (small)’: mean average precision for small objects
(area < 32^2 pixels)
- ’Precision/mAP (medium)’: mean average precision for medium sized
objects (32^2 pixels < area < 96^2 pixels)
- ’Precision/mAP (large)’: mean average precision for large objects
(96^2 pixels < area < 10000^2 pixels)
’Recall/AR@1’: average recall with 1 detection ‘Recall/AR@10’: average recall with 10 detections ‘Recall/AR@100’: average recall with 100 detections ‘Recall/AR@100 (small)’: average recall for small objects with 100
detections
- ’Recall/AR@100 (medium)’: average recall for medium objects with 100
detections
- ’Recall/AR@100 (large)’: average recall for large objects with 100
detections
per_category_ap: a dictionary holding category specific results with
keys of the form: ‘Precision mAP ByCategory/category’ (without the supercategory part if no supercategories exist). For backward compatibility ‘PerformanceByCategory’ is included in the output regardless of all_metrics_per_category. If evaluating class-agnostic mode, per_category_ap is an empty dictionary.
- Return type
summary_metrics
- Raises
ValueError – If category_stats does not exist.
- Analyze()[source]¶
Analyze detection results.
Args:
- Returns
- A dictionary containing images of analyzing result images,
key is the image name, value is a [H,W,3] numpy array which represent the image content. You can refer to http://cocodataset.org/#detection-eval section 4 Analysis code.
- easycv.core.evaluation.coco_tools.ExportSingleImageGroundtruthToCoco(image_id, next_annotation_id, category_id_set, groundtruth_boxes, groundtruth_classes, groundtruth_masks=None, groundtruth_is_crowd=None, super_categories=None)[source]¶
Export groundtruth of a single image to COCO format.
This function converts groundtruth detection annotations represented as numpy arrays to dictionaries that can be ingested by the COCO evaluation API. Note that the image_ids provided here must match the ones given to ExportSingleImageDetectionsToCoco. We assume that boxes and classes are in correspondence - that is: groundtruth_boxes[i, :], and groundtruth_classes[i] are associated with the same groundtruth annotation.
In the exported result, “area” fields are always set to the area of the groundtruth bounding box.
- Parameters
image_id – a unique image identifier either of type integer or string.
next_annotation_id – integer specifying the first id to use for the groundtruth annotations. All annotations are assigned a continuous integer id starting from this value.
category_id_set – A set of valid class ids. Groundtruth with classes not in category_id_set are dropped.
groundtruth_boxes – numpy array (float32) with shape [num_gt_boxes, 4]
groundtruth_classes – numpy array (int) with shape [num_gt_boxes]
groundtruth_masks – optional uint8 numpy array of shape [num_detections, image_height, image_width] containing detection_masks.
groundtruth_is_crowd – optional numpy array (int) with shape [num_gt_boxes] indicating whether groundtruth boxes are crowd.
super_categories – optional list of str indicating each box super category
- Returns
a list of groundtruth annotations for a single image in the COCO format.
- Raises
ValueError – if (1) groundtruth_boxes and groundtruth_classes do not have the right lengths or (2) if each of the elements inside these lists do not have the correct shapes or (3) if image_ids are not integers
- easycv.core.evaluation.coco_tools.ExportGroundtruthToCOCO(image_ids, groundtruth_boxes, groundtruth_classes, categories, output_path=None)[source]¶
Export groundtruth detection annotations in numpy arrays to COCO API.
This function converts a set of groundtruth detection annotations represented as numpy arrays to dictionaries that can be ingested by the COCO API. Inputs to this function are three lists: image ids for each groundtruth image, groundtruth boxes for each image and groundtruth classes respectively. Note that the image_ids provided here must match the ones given to the ExportDetectionsToCOCO function in order for evaluation to work properly. We assume that for each image, boxes, scores and classes are in correspondence — that is: image_id[i], groundtruth_boxes[i, :] and groundtruth_classes[i] are associated with the same groundtruth annotation.
In the exported result, “area” fields are always set to the area of the groundtruth bounding box and “iscrowd” fields are always set to 0. TODO(jonathanhuang): pass in “iscrowd” array for evaluating on COCO dataset.
- Parameters
image_ids – a list of unique image identifier either of type integer or string.
groundtruth_boxes – list of numpy arrays with shape [num_gt_boxes, 4] (note that num_gt_boxes can be different for each entry in the list)
groundtruth_classes – list of numpy arrays (int) with shape [num_gt_boxes] (note that num_gt_boxes can be different for each entry in the list)
categories –
a list of dictionaries representing all possible categories. Each dict in this list has the following keys:
’id’: (required) an integer id uniquely identifying this category ‘name’: (required) string representing category name
e.g., ‘cat’, ‘dog’, ‘pizza’
- ’supercategory’: (optional) string representing the supercategory
e.g., ‘animal’, ‘vehicle’, ‘food’, etc
output_path – (optional) path for exporting result to JSON
- Returns
dictionary that can be read by COCO API
- Raises
ValueError – if (1) groundtruth_boxes and groundtruth_classes do not have the right lengths or (2) if each of the elements inside these lists do not have the correct shapes or (3) if image_ids are not integers
- easycv.core.evaluation.coco_tools.ExportSingleImageDetectionBoxesToCoco(image_id, category_id_set, detection_boxes, detection_scores, detection_classes)[source]¶
Export detections of a single image to COCO format.
This function converts detections represented as numpy arrays to dictionaries that can be ingested by the COCO evaluation API. Note that the image_ids provided here must match the ones given to the ExporSingleImageDetectionBoxesToCoco. We assume that boxes, and classes are in correspondence - that is: boxes[i, :], and classes[i] are associated with the same groundtruth annotation.
- Parameters
image_id – unique image identifier either of type integer or string.
category_id_set – A set of valid class ids. Detections with classes not in category_id_set are dropped.
detection_boxes – float numpy array of shape [num_detections, 4] containing detection boxes.
detection_scores – float numpy array of shape [num_detections] containing scored for the detection boxes.
detection_classes – integer numpy array of shape [num_detections] containing the classes for detection boxes.
- Returns
a list of detection annotations for a single image in the COCO format.
- Raises
ValueError – if (1) detection_boxes, detection_scores and detection_classes do not have the right lengths or (2) if each of the elements inside these lists do not have the correct shapes or (3) if image_ids are not integers.
- easycv.core.evaluation.coco_tools.ExportSingleImageDetectionMasksToCoco(image_id, category_id_set, detection_masks, detection_scores, detection_classes)[source]¶
Export detection masks of a single image to COCO format.
This function converts detections represented as numpy arrays to dictionaries that can be ingested by the COCO evaluation API. We assume that detection_masks, detection_scores, and detection_classes are in correspondence - that is: detection_masks[i, :], detection_classes[i] and detection_scores[i]
are associated with the same annotation.
- Parameters
image_id – unique image identifier either of type integer or string.
category_id_set – A set of valid class ids. Detections with classes not in category_id_set are dropped.
detection_masks – uint8 numpy array of shape [num_detections, image_height, image_width] containing detection_masks.
detection_scores – float numpy array of shape [num_detections] containing scores for detection masks.
detection_classes – integer numpy array of shape [num_detections] containing the classes for detection masks.
- Returns
a list of detection mask annotations for a single image in the COCO format.
- Raises
ValueError – if (1) detection_masks, detection_scores and detection_classes do not have the right lengths or (2) if each of the elements inside these lists do not have the correct shapes or (3) if image_ids are not integers.
- easycv.core.evaluation.coco_tools.ExportDetectionsToCOCO(image_ids, detection_boxes, detection_scores, detection_classes, categories, output_path=None)[source]¶
Export detection annotations in numpy arrays to COCO API.
This function converts a set of predicted detections represented as numpy arrays to dictionaries that can be ingested by the COCO API. Inputs to this function are lists, consisting of boxes, scores and classes, respectively, corresponding to each image for which detections have been produced. Note that the image_ids provided here must match the ones given to the ExportGroundtruthToCOCO function in order for evaluation to work properly.
We assume that for each image, boxes, scores and classes are in correspondence — that is: detection_boxes[i, :], detection_scores[i] and detection_classes[i] are associated with the same detection.
- Parameters
image_ids – a list of unique image identifier either of type integer or string.
detection_boxes – list of numpy arrays with shape [num_detection_boxes, 4]
detection_scores – list of numpy arrays (float) with shape [num_detection_boxes]. Note that num_detection_boxes can be different for each entry in the list.
detection_classes – list of numpy arrays (int) with shape [num_detection_boxes]. Note that num_detection_boxes can be different for each entry in the list.
categories – a list of dictionaries representing all possible categories. Each dict in this list must have an integer ‘id’ key uniquely identifying this category.
output_path – (optional) path for exporting result to JSON
- Returns
list of dictionaries that can be read by COCO API, where each entry corresponds to a single detection and has keys from: [‘image_id’, ‘category_id’, ‘bbox’, ‘score’].
- Raises
ValueError – if (1) detection_boxes and detection_classes do not have the right lengths or (2) if each of the elements inside these lists do not have the correct shapes or (3) if image_ids are not integers.
- easycv.core.evaluation.coco_tools.ExportSegmentsToCOCO(image_ids, detection_masks, detection_scores, detection_classes, categories, output_path=None)[source]¶
Export segmentation masks in numpy arrays to COCO API.
This function converts a set of predicted instance masks represented as numpy arrays to dictionaries that can be ingested by the COCO API. Inputs to this function are lists, consisting of segments, scores and classes, respectively, corresponding to each image for which detections have been produced.
Note this function is recommended to use for small dataset. For large dataset, it should be used with a merge function (e.g. in map reduce), otherwise the memory consumption is large.
We assume that for each image, masks, scores and classes are in correspondence — that is: detection_masks[i, :, :, :], detection_scores[i] and detection_classes[i] are associated with the same detection.
- Parameters
image_ids – list of image ids (typically ints or strings)
detection_masks – list of numpy arrays with shape [num_detection, h, w, 1] and type uint8. The height and width should match the shape of corresponding image.
detection_scores – list of numpy arrays (float) with shape [num_detection]. Note that num_detection can be different for each entry in the list.
detection_classes – list of numpy arrays (int) with shape [num_detection]. Note that num_detection can be different for each entry in the list.
categories – a list of dictionaries representing all possible categories. Each dict in this list must have an integer ‘id’ key uniquely identifying this category.
output_path – (optional) path for exporting result to JSON
- Returns
list of dictionaries that can be read by COCO API, where each entry corresponds to a single detection and has keys from: [‘image_id’, ‘category_id’, ‘segmentation’, ‘score’].
- Raises
ValueError – if detection_masks and detection_classes do not have the right lengths or if each of the elements inside these lists do not have the correct shapes.
- easycv.core.evaluation.coco_tools.ExportKeypointsToCOCO(image_ids, detection_keypoints, detection_scores, detection_classes, categories, output_path=None)[source]¶
Exports keypoints in numpy arrays to COCO API.
This function converts a set of predicted keypoints represented as numpy arrays to dictionaries that can be ingested by the COCO API. Inputs to this function are lists, consisting of keypoints, scores and classes, respectively, corresponding to each image for which detections have been produced.
We assume that for each image, keypoints, scores and classes are in correspondence — that is: detection_keypoints[i, :, :, :], detection_scores[i] and detection_classes[i] are associated with the same detection.
- Parameters
image_ids – list of image ids (typically ints or strings)
detection_keypoints – list of numpy arrays with shape [num_detection, num_keypoints, 2] and type float32 in absolute x-y coordinates.
detection_scores – list of numpy arrays (float) with shape [num_detection]. Note that num_detection can be different for each entry in the list.
detection_classes – list of numpy arrays (int) with shape [num_detection]. Note that num_detection can be different for each entry in the list.
categories – a list of dictionaries representing all possible categories. Each dict in this list must have an integer ‘id’ key uniquely identifying this category and an integer ‘num_keypoints’ key specifying the number of keypoints the category has.
output_path – (optional) path for exporting result to JSON
- Returns
list of dictionaries that can be read by COCO API, where each entry corresponds to a single detection and has keys from: [‘image_id’, ‘category_id’, ‘keypoints’, ‘score’].
- Raises
ValueError – if detection_keypoints and detection_classes do not have the right lengths or if each of the elements inside these lists do not have the correct shapes.
easycv.core.evaluation.faceid_pair_eval module¶
- easycv.core.evaluation.faceid_pair_eval.calculate_roc(thresholds, embeddings1, embeddings2, actual_issame, nrof_folds=10, pca=0)[source]¶
- easycv.core.evaluation.faceid_pair_eval.calculate_val(thresholds, embeddings1, embeddings2, actual_issame, far_target, nrof_folds=10)[source]¶
- easycv.core.evaluation.faceid_pair_eval.faceid_evaluate(embeddings, actual_issame, nrof_folds=10, pca=0)[source]¶
Do Kfold=nrof_folds faceid pair-match test for embeddings :param embeddings: [N x C] inputs embedding of all dataset :param actual_issame: [N/2, 1] label of is match :param nrof_folds: KFold number :param pca: > 0 means, do pca and trans embedding to [N, pca] feature
- Returns
KFold average best accuracy and best threshold
- class easycv.core.evaluation.faceid_pair_eval.FaceIDPairEvaluator(dataset_name=None, metric_names=['acc'], kfold=10, pca=0)[source]¶
Bases:
easycv.core.evaluation.base_evaluator.Evaluator
FaceIDPairEvaluator evaluator. Input nx2 pairs and label, kfold thresholds search and return average best accuracy
- __init__(dataset_name=None, metric_names=['acc'], kfold=10, pca=0)[source]¶
Faceid small dataset evaluator, do pair match validation :param dataset_name: faceid small validate set name, include [lfw, agedb_30, cfp_ff, cfp_fw, calfw] :param kfold: Kfold for train/val split :param pca: pca dimensions, if > 0, do PCA for input feature, transfer to [n, pca]
- Returns
None
easycv.core.evaluation.metric_registry module¶
- class easycv.core.evaluation.metric_registry.MetricRegistry[source]¶
Bases:
object
- register_default_best_metric(cls, metric_name, metric_cmp_op='max')[source]¶
Register default best metric for each evaluator
- Parameters
cls (object) – class object
metric_name (str or List[str]) – default best metric name
metric_cmp_op (str or List[str]) – metric compare operation, should be one of [“max”, “min”]
easycv.core.evaluation.mse_eval module¶
- class easycv.core.evaluation.mse_eval.MSEEvaluator(dataset_name=None, metric_names=['avg_mse'], neck_num=None)[source]¶
Bases:
easycv.core.evaluation.base_evaluator.Evaluator
MSEEvaluator evaluator,
easycv.core.evaluation.retrival_topk_eval module¶
- class easycv.core.evaluation.retrival_topk_eval.RetrivalTopKEvaluator(topk=(1, 2, 4, 8), norm=0, metric='cos', pca=0, dataset_name=None, metric_names=['R@K=1'], save_results=False, save_results_dir='', feature_keyword=['neck'])[source]¶
Bases:
easycv.core.evaluation.base_evaluator.Evaluator
RetrivalTopK evaluator, Retrival evaluate do the topK retrival, by measuring the distance of every 1 vs other. get the topK nearest, and count the match of ID. if Retrival = 1, Miss = 0. Finally average all RetrivalRate.
easycv.core.evaluation.top_down_eval module¶
- easycv.core.evaluation.top_down_eval.pose_pck_accuracy(output, target, mask, thr=0.05, normalize=None)[source]¶
Calculate the pose accuracy of PCK for each individual keypoint and the averaged accuracy across all keypoints from heatmaps.
Note
PCK metric measures accuracy of the localization of the body joints. The distances between predicted positions and the ground-truth ones are typically normalized by the bounding box size. The threshold (thr) of the normalized distance is commonly set as 0.05, 0.1 or 0.2 etc.
batch_size: N num_keypoints: K heatmap height: H heatmap width: W
- Parameters
output (np.ndarray[N, K, H, W]) – Model output heatmaps.
target (np.ndarray[N, K, H, W]) – Groundtruth heatmaps.
mask (np.ndarray[N, K]) – Visibility of the target. False for invisible joints, and True for visible. Invisible joints will be ignored for accuracy calculation.
thr (float) – Threshold of PCK calculation. Default 0.05.
normalize (np.ndarray[N, 2]) – Normalization factor for H&W.
- Returns
A tuple containing keypoint accuracy.
np.ndarray[K]: Accuracy of each keypoint.
float: Averaged accuracy across all keypoints.
int: Number of valid keypoints.
- Return type
tuple
- easycv.core.evaluation.top_down_eval.keypoint_pck_accuracy(pred, gt, mask, thr, normalize)[source]¶
Calculate the pose accuracy of PCK for each individual keypoint and the averaged accuracy across all keypoints for coordinates.
Note
PCK metric measures accuracy of the localization of the body joints. The distances between predicted positions and the ground-truth ones are typically normalized by the bounding box size. The threshold (thr) of the normalized distance is commonly set as 0.05, 0.1 or 0.2 etc.
batch_size: N num_keypoints: K
- Parameters
pred (np.ndarray[N, K, 2]) – Predicted keypoint location.
gt (np.ndarray[N, K, 2]) – Groundtruth keypoint location.
mask (np.ndarray[N, K]) – Visibility of the target. False for invisible joints, and True for visible. Invisible joints will be ignored for accuracy calculation.
thr (float) – Threshold of PCK calculation.
normalize (np.ndarray[N, 2]) – Normalization factor for H&W.
- Returns
A tuple containing keypoint accuracy.
acc (np.ndarray[K]): Accuracy of each keypoint.
avg_acc (float): Averaged accuracy across all keypoints.
cnt (int): Number of valid keypoints.
- Return type
tuple
- easycv.core.evaluation.top_down_eval.post_dark_udp(coords, batch_heatmaps, kernel=3)[source]¶
DARK post-pocessing. Implemented by udp. Paper ref: Huang et al. The Devil is in the Details: Delving into Unbiased Data Processing for Human Pose Estimation (CVPR 2020). Zhang et al. Distribution-Aware Coordinate Representation for Human Pose Estimation (CVPR 2020).
Note
batch size: B num keypoints: K num persons: N height of heatmaps: H width of heatmaps: W B=1 for bottom_up paradigm where all persons share the same heatmap. B=N for top_down paradigm where each person has its own heatmaps.
- Parameters
coords (np.ndarray[N, K, 2]) – Initial coordinates of human pose.
batch_heatmaps (np.ndarray[B, K, H, W]) – batch_heatmaps
kernel (int) – Gaussian kernel size (K) for modulation.
- Returns
Refined coordinates.
- Return type
res (np.ndarray[N, K, 2])
- easycv.core.evaluation.top_down_eval.keypoints_from_heatmaps(heatmaps, center, scale, unbiased=False, post_process='default', kernel=11, valid_radius_factor=0.0546875, use_udp=False, target_type='GaussianHeatmap')[source]¶
Get final keypoint predictions from heatmaps and transform them back to the image.
Note
batch size: N num keypoints: K heatmap height: H heatmap width: W
- Parameters
heatmaps (np.ndarray[N, K, H, W]) – model predicted heatmaps.
center (np.ndarray[N, 2]) – Center of the bounding box (x, y).
scale (np.ndarray[N, 2]) – Scale of the bounding box wrt height/width.
post_process (str/None) – Choice of methods to post-process heatmaps. Currently supported: None, ‘default’, ‘unbiased’, ‘megvii’.
unbiased (bool) – Option to use unbiased decoding. Mutually exclusive with megvii. Note: this arg is deprecated and unbiased=True can be replaced by post_process=’unbiased’ Paper ref: Zhang et al. Distribution-Aware Coordinate Representation for Human Pose Estimation (CVPR 2020).
kernel (int) – Gaussian kernel size (K) for modulation, which should match the heatmap gaussian sigma when training. K=17 for sigma=3 and k=11 for sigma=2.
valid_radius_factor (float) – The radius factor of the positive area in classification heatmap for UDP.
use_udp (bool) – Use unbiased data processing.
target_type (str) – ‘GaussianHeatmap’ or ‘CombinedTarget’. GaussianHeatmap: Classification target with gaussian distribution. CombinedTarget: The combination of classification target (response map) and regression target (offset map). Paper ref: Huang et al. The Devil is in the Details: Delving into Unbiased Data Processing for Human Pose Estimation (CVPR 2020).
- Returns
A tuple containing keypoint predictions and scores.
preds (np.ndarray[N, K, 2]): Predicted keypoint location in images.
maxvals (np.ndarray[N, K, 1]): Scores (confidence) of the keypoints.
- Return type
tuple
easycv.core.optimizer package¶
Submodules¶
easycv.core.optimizer.lars module¶
- class easycv.core.optimizer.lars.LARS(params, lr=<required parameter>, momentum=0, dampening=0, weight_decay=0, eta=0.001, nesterov=False)[source]¶
Bases:
torch.optim.optimizer.Optimizer
Implements layer-wise adaptive rate scaling for SGD.
- Parameters
params (iterable) – iterable of parameters to optimize or dicts defining parameter groups
lr (float) – base learning rate (gamma_0)
momentum (float, optional) – momentum factor (default: 0) (“m”)
weight_decay (float, optional) – weight decay (L2 penalty) (default: 0) (“beta”)
dampening (float, optional) – dampening for momentum (default: 0)
eta (float, optional) – LARS coefficient
nesterov (bool, optional) – enables Nesterov momentum (default: False)
Based on Algorithm 1 of the following paper by You, Gitman, and Ginsburg. Large Batch Training of Convolutional Networks:
Example
>>> optimizer = LARS(model.parameters(), lr=0.1, momentum=0.9, >>> weight_decay=1e-4, eta=1e-3) >>> optimizer.zero_grad() >>> loss_fn(model(input), target).backward() >>> optimizer.step()
easycv.core.optimizer.ranger module¶
- easycv.core.optimizer.ranger.centralized_gradient(x, use_gc=True, gc_conv_only=False)[source]¶
credit - https://github.com/Yonghongwei/Gradient-Centralization
- class easycv.core.optimizer.ranger.Ranger(params, lr=0.001, alpha=0.5, k=6, N_sma_threshhold=5, betas=(0.95, 0.999), eps=1e-05, weight_decay=0, use_gc=True, gc_conv_only=False, gc_loc=True)[source]¶
Bases:
torch.optim.optimizer.Optimizer
Adam+LookAhead: refer to https://github.com/lessw2020/Ranger-Deep-Learning-Optimizer
- __init__(params, lr=0.001, alpha=0.5, k=6, N_sma_threshhold=5, betas=(0.95, 0.999), eps=1e-05, weight_decay=0, use_gc=True, gc_conv_only=False, gc_loc=True)[source]¶
Initialize self. See help(type(self)) for accurate signature.
- step(closure=None)[source]¶
Performs a single optimization step (parameter update).
- Parameters
closure (callable) – A closure that reevaluates the model and returns the loss. Optional for most optimizers.
Note
Unless otherwise specified, this function should not modify the
.grad
field of the parameters.
easycv.core.post_processing package¶
- easycv.core.post_processing.affine_transform(pt, trans_mat)[source]¶
Apply an affine transformation to the points.
- Parameters
pt (np.ndarray) – a 2 dimensional point to be transformed
trans_mat (np.ndarray) – 2x3 matrix of an affine transform
- Returns
Transformed points.
- Return type
np.ndarray
- easycv.core.post_processing.flip_back(output_flipped, flip_pairs, target_type='GaussianHeatmap')[source]¶
Flip the flipped heatmaps back to the original form.
Note
batch_size: N num_keypoints: K heatmap height: H heatmap width: W
- Parameters
output_flipped (np.ndarray[N, K, H, W]) – The output heatmaps obtained from the flipped images.
flip_pairs (list[tuple()) – Pairs of keypoints which are mirrored (for example, left ear – right ear).
target_type (str) – GaussianHeatmap or CombinedTarget
- Returns
heatmaps that flipped back to the original image
- Return type
np.ndarray
- easycv.core.post_processing.fliplr_joints(joints_3d, joints_3d_visible, img_width, flip_pairs)[source]¶
Flip human joints horizontally.
Note
num_keypoints: K
- Parameters
joints_3d (np.ndarray([K, 3])) – Coordinates of keypoints.
joints_3d_visible (np.ndarray([K, 1])) – Visibility of keypoints.
img_width (int) – Image width.
flip_pairs (list[tuple()]) – Pairs of keypoints which are mirrored (for example, left ear – right ear).
- Returns
Flipped human joints.
joints_3d_flipped (np.ndarray([K, 3])): Flipped joints.
joints_3d_visible_flipped (np.ndarray([K, 1])): Joint visibility.
- Return type
tuple
- easycv.core.post_processing.fliplr_regression(regression, flip_pairs, center_mode='static', center_x=0.5, center_index=0)[source]¶
Flip human joints horizontally.
Note
batch_size: N num_keypoint: K
- Parameters
regression (np.ndarray([..., K, C])) –
Coordinates of keypoints, where K is the joint number and C is the dimension. Example shapes are: - [N, K, C]: a batch of keypoints where N is the batch size. - [N, T, K, C]: a batch of pose sequences, where T is the frame
number.
flip_pairs (list[tuple()]) – Pairs of keypoints which are mirrored (for example, left ear – right ear).
center_mode (str) – The mode to set the center location on the x-axis to flip around. Options are: - static: use a static x value (see center_x also) - root: use a root joint (see center_index also)
center_x (float) – Set the x-axis location of the flip center. Only used when center_mode=static.
center_index (int) – Set the index of the root joint, whose x location will be used as the flip center. Only used when center_mode=root.
- Returns
Flipped human joints.
regression_flipped (np.ndarray([…, K, C])): Flipped joints.
- Return type
tuple
- easycv.core.post_processing.get_affine_transform(center, scale, rot, output_size, shift=(0.0, 0.0), inv=False)[source]¶
Get the affine transform matrix, given the center/scale/rot/output_size.
- Parameters
center (np.ndarray[2, ]) – Center of the bounding box (x, y).
scale (np.ndarray[2, ]) – Scale of the bounding box wrt [width, height].
rot (float) – Rotation angle (degree).
output_size (np.ndarray[2, ] | list(2,)) – Size of the destination heatmaps.
shift (0-100%) – Shift translation ratio wrt the width/height. Default (0., 0.).
inv (bool) – Option to inverse the affine transform direction. (inv=False: src->dst or inv=True: dst->src)
- Returns
The transform matrix.
- Return type
np.ndarray
- easycv.core.post_processing.get_warp_matrix(theta, size_input, size_dst, size_target)[source]¶
Calculate the transformation matrix under the constraint of unbiased. Paper ref: Huang et al. The Devil is in the Details: Delving into Unbiased Data Processing for Human Pose Estimation (CVPR 2020).
- Parameters
theta (float) – Rotation angle in degrees.
size_input (np.ndarray) – Size of input image [w, h].
size_dst (np.ndarray) – Size of output image [w, h].
size_target (np.ndarray) – Size of ROI in input plane [w, h].
- Returns
A matrix for transformation.
- Return type
matrix (np.ndarray)
- easycv.core.post_processing.rotate_point(pt, angle_rad)[source]¶
Rotate a point by an angle.
- Parameters
pt (list[float]) – 2 dimensional point to be rotated
angle_rad (float) – rotation angle by radian
- Returns
Rotated point.
- Return type
list[float]
- easycv.core.post_processing.transform_preds(coords, center, scale, output_size, use_udp=False)[source]¶
Get final keypoint predictions from heatmaps and apply scaling and translation to map them back to the image.
Note
num_keypoints: K
- Parameters
coords (np.ndarray[K, ndims]) –
If ndims=2, corrds are predicted keypoint location.
If ndims=4, corrds are composed of (x, y, scores, tags)
If ndims=5, corrds are composed of (x, y, scores, tags, flipped_tags)
center (np.ndarray[2, ]) – Center of the bounding box (x, y).
scale (np.ndarray[2, ]) – Scale of the bounding box wrt [width, height].
output_size (np.ndarray[2, ] | list(2,)) – Size of the destination heatmaps.
use_udp (bool) – Use unbiased data processing
- Returns
Predicted coordinates in the images.
- Return type
np.ndarray
- easycv.core.post_processing.warp_affine_joints(joints, mat)[source]¶
Apply affine transformation defined by the transform matrix on the joints.
- Parameters
joints (np.ndarray[..., 2]) – Origin coordinate of joints.
mat (np.ndarray[3, 2]) – The affine matrix.
- Returns
Result coordinate of joints.
- Return type
matrix (np.ndarray[…, 2])
- easycv.core.post_processing.oks_nms(kpts_db, thr, sigmas=None, vis_thr=None)[source]¶
OKS NMS implementations.
- Parameters
kpts_db – keypoints.
thr – Retain overlap < thr.
sigmas – standard deviation of keypoint labelling.
vis_thr – threshold of the keypoint visibility.
- Returns
indexes to keep.
- Return type
np.ndarray
- easycv.core.post_processing.soft_oks_nms(kpts_db, thr, max_dets=20, sigmas=None, vis_thr=None)[source]¶
Soft OKS NMS implementations.
- Parameters
kpts_db –
thr – retain oks overlap < thr.
max_dets – max number of detections to keep.
sigmas – Keypoint labelling uncertainty.
- Returns
indexes to keep.
- Return type
np.ndarray
Submodules¶
easycv.core.post_processing.nms module¶
- easycv.core.post_processing.nms.oks_iou(g, d, a_g, a_d, sigmas=None, vis_thr=None)[source]¶
Calculate oks ious.
- Parameters
g – Ground truth keypoints.
d – Detected keypoints.
a_g – Area of the ground truth object.
a_d – Area of the detected object.
sigmas – standard deviation of keypoint labelling.
vis_thr – threshold of the keypoint visibility.
- Returns
The oks ious.
- Return type
list
- easycv.core.post_processing.nms.oks_nms(kpts_db, thr, sigmas=None, vis_thr=None)[source]¶
OKS NMS implementations.
- Parameters
kpts_db – keypoints.
thr – Retain overlap < thr.
sigmas – standard deviation of keypoint labelling.
vis_thr – threshold of the keypoint visibility.
- Returns
indexes to keep.
- Return type
np.ndarray
- easycv.core.post_processing.nms.soft_oks_nms(kpts_db, thr, max_dets=20, sigmas=None, vis_thr=None)[source]¶
Soft OKS NMS implementations.
- Parameters
kpts_db –
thr – retain oks overlap < thr.
max_dets – max number of detections to keep.
sigmas – Keypoint labelling uncertainty.
- Returns
indexes to keep.
- Return type
np.ndarray
easycv.core.post_processing.pose_transforms module¶
- easycv.core.post_processing.pose_transforms.fliplr_joints(joints_3d, joints_3d_visible, img_width, flip_pairs)[source]¶
Flip human joints horizontally.
Note
num_keypoints: K
- Parameters
joints_3d (np.ndarray([K, 3])) – Coordinates of keypoints.
joints_3d_visible (np.ndarray([K, 1])) – Visibility of keypoints.
img_width (int) – Image width.
flip_pairs (list[tuple()]) – Pairs of keypoints which are mirrored (for example, left ear – right ear).
- Returns
Flipped human joints.
joints_3d_flipped (np.ndarray([K, 3])): Flipped joints.
joints_3d_visible_flipped (np.ndarray([K, 1])): Joint visibility.
- Return type
tuple
- easycv.core.post_processing.pose_transforms.fliplr_regression(regression, flip_pairs, center_mode='static', center_x=0.5, center_index=0)[source]¶
Flip human joints horizontally.
Note
batch_size: N num_keypoint: K
- Parameters
regression (np.ndarray([..., K, C])) –
Coordinates of keypoints, where K is the joint number and C is the dimension. Example shapes are: - [N, K, C]: a batch of keypoints where N is the batch size. - [N, T, K, C]: a batch of pose sequences, where T is the frame
number.
flip_pairs (list[tuple()]) – Pairs of keypoints which are mirrored (for example, left ear – right ear).
center_mode (str) – The mode to set the center location on the x-axis to flip around. Options are: - static: use a static x value (see center_x also) - root: use a root joint (see center_index also)
center_x (float) – Set the x-axis location of the flip center. Only used when center_mode=static.
center_index (int) – Set the index of the root joint, whose x location will be used as the flip center. Only used when center_mode=root.
- Returns
Flipped human joints.
regression_flipped (np.ndarray([…, K, C])): Flipped joints.
- Return type
tuple
- easycv.core.post_processing.pose_transforms.flip_back(output_flipped, flip_pairs, target_type='GaussianHeatmap')[source]¶
Flip the flipped heatmaps back to the original form.
Note
batch_size: N num_keypoints: K heatmap height: H heatmap width: W
- Parameters
output_flipped (np.ndarray[N, K, H, W]) – The output heatmaps obtained from the flipped images.
flip_pairs (list[tuple()) – Pairs of keypoints which are mirrored (for example, left ear – right ear).
target_type (str) – GaussianHeatmap or CombinedTarget
- Returns
heatmaps that flipped back to the original image
- Return type
np.ndarray
- easycv.core.post_processing.pose_transforms.transform_preds(coords, center, scale, output_size, use_udp=False)[source]¶
Get final keypoint predictions from heatmaps and apply scaling and translation to map them back to the image.
Note
num_keypoints: K
- Parameters
coords (np.ndarray[K, ndims]) –
If ndims=2, corrds are predicted keypoint location.
If ndims=4, corrds are composed of (x, y, scores, tags)
If ndims=5, corrds are composed of (x, y, scores, tags, flipped_tags)
center (np.ndarray[2, ]) – Center of the bounding box (x, y).
scale (np.ndarray[2, ]) – Scale of the bounding box wrt [width, height].
output_size (np.ndarray[2, ] | list(2,)) – Size of the destination heatmaps.
use_udp (bool) – Use unbiased data processing
- Returns
Predicted coordinates in the images.
- Return type
np.ndarray
- easycv.core.post_processing.pose_transforms.get_affine_transform(center, scale, rot, output_size, shift=(0.0, 0.0), inv=False)[source]¶
Get the affine transform matrix, given the center/scale/rot/output_size.
- Parameters
center (np.ndarray[2, ]) – Center of the bounding box (x, y).
scale (np.ndarray[2, ]) – Scale of the bounding box wrt [width, height].
rot (float) – Rotation angle (degree).
output_size (np.ndarray[2, ] | list(2,)) – Size of the destination heatmaps.
shift (0-100%) – Shift translation ratio wrt the width/height. Default (0., 0.).
inv (bool) – Option to inverse the affine transform direction. (inv=False: src->dst or inv=True: dst->src)
- Returns
The transform matrix.
- Return type
np.ndarray
- easycv.core.post_processing.pose_transforms.affine_transform(pt, trans_mat)[source]¶
Apply an affine transformation to the points.
- Parameters
pt (np.ndarray) – a 2 dimensional point to be transformed
trans_mat (np.ndarray) – 2x3 matrix of an affine transform
- Returns
Transformed points.
- Return type
np.ndarray
- easycv.core.post_processing.pose_transforms.rotate_point(pt, angle_rad)[source]¶
Rotate a point by an angle.
- Parameters
pt (list[float]) – 2 dimensional point to be rotated
angle_rad (float) – rotation angle by radian
- Returns
Rotated point.
- Return type
list[float]
- easycv.core.post_processing.pose_transforms.get_warp_matrix(theta, size_input, size_dst, size_target)[source]¶
Calculate the transformation matrix under the constraint of unbiased. Paper ref: Huang et al. The Devil is in the Details: Delving into Unbiased Data Processing for Human Pose Estimation (CVPR 2020).
- Parameters
theta (float) – Rotation angle in degrees.
size_input (np.ndarray) – Size of input image [w, h].
size_dst (np.ndarray) – Size of output image [w, h].
size_target (np.ndarray) – Size of ROI in input plane [w, h].
- Returns
A matrix for transformation.
- Return type
matrix (np.ndarray)
- easycv.core.post_processing.pose_transforms.warp_affine_joints(joints, mat)[source]¶
Apply affine transformation defined by the transform matrix on the joints.
- Parameters
joints (np.ndarray[..., 2]) – Origin coordinate of joints.
mat (np.ndarray[3, 2]) – The affine matrix.
- Returns
Result coordinate of joints.
- Return type
matrix (np.ndarray[…, 2])
easycv.core.visualization package¶
- easycv.core.visualization.imshow_bboxes(img, bboxes, labels=None, colors='green', text_color='white', font_size=20, thickness=1, font_scale=0.5, show=True, win_name='', wait_time=0, out_file=None)[source]¶
Draw bboxes with labels (optional) on an image. This is a wrapper of mmcv.imshow_bboxes.
- Parameters
img (str or ndarray) – The image to be displayed.
bboxes (ndarray) – ndarray of shape (k, 4), each row is a bbox in format [x1, y1, x2, y2].
labels (str or list[str], optional) – labels of each bbox.
colors (list[str or tuple or
Color
]) – A list of colors.text_color (str or tuple or
Color
) – Color of texts.font_size (int) – Size of font.
thickness (int) – Thickness of lines.
font_scale (float) – Font scales of texts.
show (bool) – Whether to show the image.
win_name (str) – The window name.
wait_time (int) – Value of waitKey param.
out_file (str, optional) – The filename to write the image.
- Returns
The image with bboxes drawn on it.
- Return type
ndarray
- easycv.core.visualization.imshow_keypoints(img, pose_result, skeleton=None, kpt_score_thr=0.3, pose_kpt_color=None, pose_link_color=None, radius=4, thickness=1, show_keypoint_weight=False)[source]¶
Draw keypoints and links on an image.
- Parameters
img (str or Tensor) – The image to draw poses on. If an image array is given, id will be modified in-place.
pose_result (list[kpts]) – The poses to draw. Each element kpts is a set of K keypoints as an Kx3 numpy.ndarray, where each keypoint is represented as x, y, score.
kpt_score_thr (float, optional) – Minimum score of keypoints to be shown. Default: 0.3.
pose_kpt_color (np.array[Nx3]`) – Color of N keypoints. If None, the keypoint will not be drawn.
pose_link_color (np.array[Mx3]) – Color of M links. If None, the links will not be drawn.
thickness (int) – Thickness of lines.
- easycv.core.visualization.imshow_label(img, labels, text_color='blue', font_size=20, thickness=1, font_scale=0.5, intervel=5, show=True, win_name='', wait_time=0, out_file=None)[source]¶
Draw images with labels on an image.
- Parameters
img (str or ndarray) – The image to be displayed.
labels (str or list[str]) – labels of each image.
text_color (str or tuple or
Color
) – Color of texts.font_size (int) – Size of font.
thickness (int) – Thickness of lines.
font_scale (float) – Font scales of texts.
intervel(int) – interval pixels between multiple labels
show (bool) – Whether to show the image.
win_name (str) – The window name.
wait_time (int) – Value of waitKey param.
out_file (str, optional) – The filename to write the image.
- Returns
The image with bboxes drawn on it.
- Return type
ndarray
Submodules¶
easycv.core.visualization.image module¶
- easycv.core.visualization.image.put_text(img, xy, text, fill, size=20)[source]¶
support chinese text
- easycv.core.visualization.image.imshow_label(img, labels, text_color='blue', font_size=20, thickness=1, font_scale=0.5, intervel=5, show=True, win_name='', wait_time=0, out_file=None)[source]¶
Draw images with labels on an image.
- Parameters
img (str or ndarray) – The image to be displayed.
labels (str or list[str]) – labels of each image.
text_color (str or tuple or
Color
) – Color of texts.font_size (int) – Size of font.
thickness (int) – Thickness of lines.
font_scale (float) – Font scales of texts.
intervel(int) – interval pixels between multiple labels
show (bool) – Whether to show the image.
win_name (str) – The window name.
wait_time (int) – Value of waitKey param.
out_file (str, optional) – The filename to write the image.
- Returns
The image with bboxes drawn on it.
- Return type
ndarray
- easycv.core.visualization.image.imshow_bboxes(img, bboxes, labels=None, colors='green', text_color='white', font_size=20, thickness=1, font_scale=0.5, show=True, win_name='', wait_time=0, out_file=None)[source]¶
Draw bboxes with labels (optional) on an image. This is a wrapper of mmcv.imshow_bboxes.
- Parameters
img (str or ndarray) – The image to be displayed.
bboxes (ndarray) – ndarray of shape (k, 4), each row is a bbox in format [x1, y1, x2, y2].
labels (str or list[str], optional) – labels of each bbox.
colors (list[str or tuple or
Color
]) – A list of colors.text_color (str or tuple or
Color
) – Color of texts.font_size (int) – Size of font.
thickness (int) – Thickness of lines.
font_scale (float) – Font scales of texts.
show (bool) – Whether to show the image.
win_name (str) – The window name.
wait_time (int) – Value of waitKey param.
out_file (str, optional) – The filename to write the image.
- Returns
The image with bboxes drawn on it.
- Return type
ndarray
- easycv.core.visualization.image.imshow_keypoints(img, pose_result, skeleton=None, kpt_score_thr=0.3, pose_kpt_color=None, pose_link_color=None, radius=4, thickness=1, show_keypoint_weight=False)[source]¶
Draw keypoints and links on an image.
- Parameters
img (str or Tensor) – The image to draw poses on. If an image array is given, id will be modified in-place.
pose_result (list[kpts]) – The poses to draw. Each element kpts is a set of K keypoints as an Kx3 numpy.ndarray, where each keypoint is represented as x, y, score.
kpt_score_thr (float, optional) – Minimum score of keypoints to be shown. Default: 0.3.
pose_kpt_color (np.array[Nx3]`) – Color of N keypoints. If None, the keypoint will not be drawn.
pose_link_color (np.array[Mx3]) – Color of M links. If None, the links will not be drawn.
thickness (int) – Thickness of lines.
Submodules¶
easycv.core.standard_fields module¶
Contains classes specifying naming conventions used for object detection.
- Specifies:
InputDataFields: standard fields used by reader/preprocessor/batcher. DetectionResultFields: standard fields returned by object detector. BoxListFields: standard field used by BoxList TfExampleFields: standard fields for tf-example data format (go/tf-example).
- class easycv.core.standard_fields.InputDataFields[source]¶
Bases:
object
Names for the input tensors.
Holds the standard data field names to use for identifying input tensors. This should be used by the decoder to identify keys for the returned tensor_dict containing input tensors. And it should be used by the model to identify the tensors it needs.
- image¶
image.
- original_image¶
image in the original input size.
- key¶
unique key corresponding to image.
- source_id¶
source of the original image.
- filename¶
original filename of the dataset (without common path).
- groundtruth_image_classes¶
image-level class labels.
- groundtruth_boxes¶
coordinates of the ground truth boxes in the image.
- groundtruth_classes¶
box-level class labels.
- groundtruth_label_types¶
box-level label types (e.g. explicit negative).
- groundtruth_is_crowd¶
[DEPRECATED, use groundtruth_group_of instead] is the groundtruth a single object or a crowd.
- groundtruth_area¶
area of a groundtruth segment.
- groundtruth_difficult¶
is a difficult object
- groundtruth_group_of¶
is a group_of objects, e.g. multiple objects of the same class, forming a connected group, where instances are heavily occluding each other.
- proposal_boxes¶
coordinates of object proposal boxes.
- proposal_objectness¶
objectness score of each proposal.
- groundtruth_instance_masks¶
ground truth instance masks.
- groundtruth_instance_boundaries¶
ground truth instance boundaries.
- groundtruth_instance_classes¶
instance mask-level class labels.
- groundtruth_keypoints¶
ground truth keypoints.
- groundtruth_keypoint_visibilities¶
ground truth keypoint visibilities.
- groundtruth_label_scores¶
groundtruth label scores.
- groundtruth_weights¶
groundtruth weight factor for bounding boxes.
- num_groundtruth_boxes¶
number of groundtruth boxes.
- true_image_shapes¶
true shapes of images in the resized images, as resized images can be padded with zeros.
- image = 'image'¶
- mask = 'mask'¶
- width = 'width'¶
- height = 'height'¶
- original_image = 'original_image'¶
- optical_flow = 'optical_flow'¶
- key = 'key'¶
- source_id = 'source_id'¶
- filename = 'filename'¶
- dataset_name = 'dataset_name'¶
- groundtruth_image_classes = 'groundtruth_image_classes'¶
- groundtruth_image_classes_num = 'groundtruth_image_classes_num'¶
- groundtruth_boxes = 'groundtruth_boxes'¶
- groundtruth_classes = 'groundtruth_classes'¶
- groundtruth_label_types = 'groundtruth_label_types'¶
- groundtruth_is_crowd = 'groundtruth_is_crowd'¶
- groundtruth_area = 'groundtruth_area'¶
- groundtruth_difficult = 'groundtruth_difficult'¶
- groundtruth_group_of = 'groundtruth_group_of'¶
- proposal_boxes = 'proposal_boxes'¶
- proposal_objectness = 'proposal_objectness'¶
- groundtruth_instance_masks = 'groundtruth_instance_masks'¶
- groundtruth_instance_boundaries = 'groundtruth_instance_boundaries'¶
- groundtruth_instance_classes = 'groundtruth_instance_classes'¶
- groundtruth_keypoints = 'groundtruth_keypoints'¶
- groundtruth_keypoint_visibilities = 'groundtruth_keypoint_visibilities'¶
- groundtruth_label_scores = 'groundtruth_label_scores'¶
- groundtruth_weights = 'groundtruth_weights'¶
- num_groundtruth_boxes = 'num_groundtruth_boxes'¶
- true_image_shape = 'true_image_shape'¶
- original_image_shape = 'original_image_shape'¶
- original_instance_masks = 'original_instance_masks'¶
- groundtruth_boxes_absolute = 'groundtruth_boxes_absolute'¶
- groundtruth_keypoints_absolute = 'groundtruth_keypoints_absolute'¶
- label_map = 'label_map'¶
- char_dict = 'char_dict'¶
- class easycv.core.standard_fields.DetectionResultFields[source]¶
Bases:
object
Naming conventions for storing the output of the detector.
- source_id¶
source of the original image.
- key¶
unique key corresponding to image.
- detection_boxes¶
coordinates of the detection boxes in the image.
- detection_scores¶
detection scores for the detection boxes in the image.
- detection_classes¶
detection-level class labels.
- detection_masks¶
contains a segmentation mask for each detection box.
- detection_boundaries¶
contains an object boundary for each detection box.
- detection_keypoints¶
contains detection keypoints for each detection box.
- num_detections¶
number of detections in the batch.
- source_id = 'source_id'¶
- key = 'key'¶
- detection_boxes = 'detection_boxes'¶
- detection_scores = 'detection_scores'¶
- detection_classes = 'detection_classes'¶
- detection_masks = 'detection_masks'¶
- detection_boundaries = 'detection_boundaries'¶
- detection_keypoints = 'detection_keypoints'¶
- num_detections = 'num_detections'¶
- class easycv.core.standard_fields.TfExampleFields[source]¶
Bases:
object
TF-example proto feature names for object detection.
Holds the standard feature names to load from an Example proto for object detection.
- image_encoded¶
JPEG encoded string
- image_format¶
image format, e.g. “JPEG”
- filename¶
filename
- channels¶
number of channels of image
- colorspace¶
colorspace, e.g. “RGB”
- height¶
height of image in pixels, e.g. 462
- width¶
width of image in pixels, e.g. 581
- source_id¶
original source of the image
- object_class_text¶
labels in text format, e.g. [“person”, “cat”]
- object_class_label¶
labels in numbers, e.g. [16, 8]
- object_bbox_xmin¶
xmin coordinates of groundtruth box, e.g. 10, 30
- object_bbox_xmax¶
xmax coordinates of groundtruth box, e.g. 50, 40
- object_bbox_ymin¶
ymin coordinates of groundtruth box, e.g. 40, 50
- object_bbox_ymax¶
ymax coordinates of groundtruth box, e.g. 80, 70
- object_view¶
viewpoint of object, e.g. [“frontal”, “left”]
- object_truncated¶
is object truncated, e.g. [true, false]
- object_occluded¶
is object occluded, e.g. [true, false]
- object_difficult¶
is object difficult, e.g. [true, false]
- object_group_of¶
is object a single object or a group of objects
- object_depiction¶
is object a depiction
- object_is_crowd¶
[DEPRECATED, use object_group_of instead] is the object a single object or a crowd
- object_segment_area¶
the area of the segment.
- object_weight¶
a weight factor for the object’s bounding box.
- instance_masks¶
instance segmentation masks.
- instance_boundaries¶
instance boundaries.
- instance_classes¶
Classes for each instance segmentation mask.
- detection_class_label¶
class label in numbers.
- detection_bbox_ymin¶
ymin coordinates of a detection box.
- detection_bbox_xmin¶
xmin coordinates of a detection box.
- detection_bbox_ymax¶
ymax coordinates of a detection box.
- detection_bbox_xmax¶
xmax coordinates of a detection box.
- detection_score¶
detection score for the class label and box.
- image_encoded = 'image/encoded'¶
- image_format = 'image/format'¶
- filename = 'image/filename'¶
- channels = 'image/channels'¶
- colorspace = 'image/colorspace'¶
- height = 'image/height'¶
- width = 'image/width'¶
- source_id = 'image/source_id'¶
- object_class_text = 'image/object/class/text'¶
- object_class_label = 'image/object/class/label'¶
- object_bbox_ymin = 'image/object/bbox/ymin'¶
- object_bbox_xmin = 'image/object/bbox/xmin'¶
- object_bbox_ymax = 'image/object/bbox/ymax'¶
- object_bbox_xmax = 'image/object/bbox/xmax'¶
- object_view = 'image/object/view'¶
- object_truncated = 'image/object/truncated'¶
- object_occluded = 'image/object/occluded'¶
- object_difficult = 'image/object/difficult'¶
- object_group_of = 'image/object/group_of'¶
- object_depiction = 'image/object/depiction'¶
- object_is_crowd = 'image/object/is_crowd'¶
- object_segment_area = 'image/object/segment/area'¶
- object_weight = 'image/object/weight'¶
- instance_masks = 'image/segmentation/object'¶
- instance_boundaries = 'image/boundaries/object'¶
- instance_classes = 'image/segmentation/object/class'¶
- detection_class_label = 'image/detection/label'¶
- detection_bbox_ymin = 'image/detection/bbox/ymin'¶
- detection_bbox_xmin = 'image/detection/bbox/xmin'¶
- detection_bbox_ymax = 'image/detection/bbox/ymax'¶
- detection_bbox_xmax = 'image/detection/bbox/xmax'¶
- detection_score = 'image/detection/score'¶
easycv.models package¶
Subpackages¶
easycv.models.backbones package¶
Submodules¶
easycv.models.backbones.benchmark_mlp module¶
- class easycv.models.backbones.benchmark_mlp.BenchMarkMLP(feature_num, num_classes=1000, avg_pool=False, **kwargs)[source]¶
Bases:
torch.nn.modules.module.Module
- __init__(feature_num, num_classes=1000, avg_pool=False, **kwargs)[source]¶
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- forward(x)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- training: bool¶
easycv.models.backbones.bninception module¶
This model is taken from the official PyTorch model zoo. - torchvision.models.mobilenet.py on 31th Aug, 2019
- class easycv.models.backbones.bninception.BNInception(num_classes=0)[source]¶
Bases:
torch.nn.modules.module.Module
- __init__(num_classes=0)[source]¶
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- forward(input)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- training: bool¶
easycv.models.backbones.darknet module¶
- class easycv.models.backbones.darknet.Darknet(depth, in_channels=3, stem_out_channels=32, out_features=('dark3', 'dark4', 'dark5'))[source]¶
Bases:
torch.nn.modules.module.Module
- depth2blocks = {21: [1, 2, 2, 1], 53: [2, 8, 8, 4]}¶
- __init__(depth, in_channels=3, stem_out_channels=32, out_features=('dark3', 'dark4', 'dark5'))[source]¶
- Parameters
depth (int) – depth of darknet used in model, usually use [21, 53] for this param.
in_channels (int) – number of input channels, for example, use 3 for RGB image.
stem_out_channels (int) – number of output chanels of darknet stem. It decides channels of darknet layer2 to layer5.
out_features (Tuple[str]) – desired output layer name.
- make_group_layer(in_channels: int, num_blocks: int, stride: int = 1)[source]¶
starts with conv layer then has num_blocks ResLayer
- forward(x)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- training: bool¶
- class easycv.models.backbones.darknet.CSPDarknet(dep_mul, wid_mul, out_features=('dark3', 'dark4', 'dark5'), depthwise=False, act='silu')[source]¶
Bases:
torch.nn.modules.module.Module
- __init__(dep_mul, wid_mul, out_features=('dark3', 'dark4', 'dark5'), depthwise=False, act='silu')[source]¶
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- forward(x)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- training: bool¶
easycv.models.backbones.genet module¶
- class easycv.models.backbones.genet.PlainNetBasicBlockClass(in_channels=0, out_channels=0, stride=1, no_create=False, block_name=None, **kwargs)[source]¶
Bases:
torch.nn.modules.module.Module
- __init__(in_channels=0, out_channels=0, stride=1, no_create=False, block_name=None, **kwargs)[source]¶
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- forward(x)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- training: bool¶
- class easycv.models.backbones.genet.AdaptiveAvgPool(out_channels, output_size, no_create=False, block_name=None, **kwargs)[source]¶
Bases:
easycv.models.backbones.genet.PlainNetBasicBlockClass
- __init__(out_channels, output_size, no_create=False, block_name=None, **kwargs)[source]¶
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- forward(x)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- training: bool¶
- class easycv.models.backbones.genet.BN(out_channels=None, copy_from=None, no_create=False, block_name=None, **kwargs)[source]¶
Bases:
easycv.models.backbones.genet.PlainNetBasicBlockClass
- __init__(out_channels=None, copy_from=None, no_create=False, block_name=None, **kwargs)[source]¶
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- forward(x)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- training: bool¶
- class easycv.models.backbones.genet.ConvDW(out_channels=None, kernel_size=None, stride=None, copy_from=None, no_create=False, block_name=None, **kwargs)[source]¶
Bases:
easycv.models.backbones.genet.PlainNetBasicBlockClass
- __init__(out_channels=None, kernel_size=None, stride=None, copy_from=None, no_create=False, block_name=None, **kwargs)[source]¶
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- forward(x)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- training: bool¶
- class easycv.models.backbones.genet.ConvKX(in_channels=None, out_channels=None, kernel_size=None, stride=None, copy_from=None, no_create=False, block_name=None, **kwargs)[source]¶
Bases:
easycv.models.backbones.genet.PlainNetBasicBlockClass
- __init__(in_channels=None, out_channels=None, kernel_size=None, stride=None, copy_from=None, no_create=False, block_name=None, **kwargs)[source]¶
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- forward(x)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- training: bool¶
- class easycv.models.backbones.genet.Flatten(out_channels, no_create=False, block_name=None, **kwargs)[source]¶
Bases:
easycv.models.backbones.genet.PlainNetBasicBlockClass
- __init__(out_channels, no_create=False, block_name=None, **kwargs)[source]¶
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- forward(x)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- training: bool¶
- class easycv.models.backbones.genet.Linear(in_channels=None, out_channels=None, bias=None, copy_from=None, no_create=False, block_name=None, **kwargs)[source]¶
Bases:
easycv.models.backbones.genet.PlainNetBasicBlockClass
- __init__(in_channels=None, out_channels=None, bias=None, copy_from=None, no_create=False, block_name=None, **kwargs)[source]¶
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- forward(x)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- training: bool¶
- class easycv.models.backbones.genet.MaxPool(out_channels, kernel_size, stride, no_create=False, block_name=None, **kwargs)[source]¶
Bases:
easycv.models.backbones.genet.PlainNetBasicBlockClass
- __init__(out_channels, kernel_size, stride, no_create=False, block_name=None, **kwargs)[source]¶
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- forward(x)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- training: bool¶
- class easycv.models.backbones.genet.MultiSumBlock(inner_block_list, no_create=False, block_name=None, **kwargs)[source]¶
Bases:
easycv.models.backbones.genet.PlainNetBasicBlockClass
- __init__(inner_block_list, no_create=False, block_name=None, **kwargs)[source]¶
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- forward(x)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- training: bool¶
- class easycv.models.backbones.genet.RELU(out_channels, no_create=False, block_name=None, **kwargs)[source]¶
Bases:
easycv.models.backbones.genet.PlainNetBasicBlockClass
- __init__(out_channels, no_create=False, block_name=None, **kwargs)[source]¶
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- forward(x)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- training: bool¶
- class easycv.models.backbones.genet.ResBlock(inner_block_list, in_channels=None, stride=None, no_create=False, block_name=None, **kwargs)[source]¶
Bases:
easycv.models.backbones.genet.PlainNetBasicBlockClass
ResBlock(in_channles, inner_blocks_str). If in_channels is missing, use inner_block_list[0].in_channels as in_channels
- __init__(inner_block_list, in_channels=None, stride=None, no_create=False, block_name=None, **kwargs)[source]¶
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- forward(x)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- training: bool¶
- class easycv.models.backbones.genet.Sequential(inner_block_list, no_create=False, block_name=None, **kwargs)[source]¶
Bases:
easycv.models.backbones.genet.PlainNetBasicBlockClass
- __init__(inner_block_list, no_create=False, block_name=None, **kwargs)[source]¶
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- forward(x)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- training: bool¶
- class easycv.models.backbones.genet.SuperResKXKX(in_channels=0, out_channels=0, kernel_size=3, stride=1, expansion=1.0, sublayers=1, no_create=False, block_name=None, **kwargs)[source]¶
Bases:
easycv.models.backbones.genet.PlainNetBasicBlockClass
- __init__(in_channels=0, out_channels=0, kernel_size=3, stride=1, expansion=1.0, sublayers=1, no_create=False, block_name=None, **kwargs)[source]¶
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- forward(x)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- training: bool¶
- class easycv.models.backbones.genet.SuperResK1KX(in_channels=0, out_channels=0, kernel_size=3, stride=1, expansion=1.0, sublayers=1, no_create=False, block_name=None, **kwargs)[source]¶
Bases:
easycv.models.backbones.genet.PlainNetBasicBlockClass
- __init__(in_channels=0, out_channels=0, kernel_size=3, stride=1, expansion=1.0, sublayers=1, no_create=False, block_name=None, **kwargs)[source]¶
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- forward(x)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- training: bool¶
- class easycv.models.backbones.genet.SuperResK1KXK1(in_channels=0, out_channels=0, kernel_size=3, stride=1, expansion=1.0, sublayers=1, no_create=False, block_name=None, **kwargs)[source]¶
Bases:
easycv.models.backbones.genet.PlainNetBasicBlockClass
- __init__(in_channels=0, out_channels=0, kernel_size=3, stride=1, expansion=1.0, sublayers=1, no_create=False, block_name=None, **kwargs)[source]¶
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- forward(x)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- training: bool¶
- class easycv.models.backbones.genet.SuperResK1DWK1(in_channels=0, out_channels=0, kernel_size=3, stride=1, expansion=1.0, sublayers=1, no_create=False, block_name=None, **kwargs)[source]¶
Bases:
easycv.models.backbones.genet.PlainNetBasicBlockClass
- __init__(in_channels=0, out_channels=0, kernel_size=3, stride=1, expansion=1.0, sublayers=1, no_create=False, block_name=None, **kwargs)[source]¶
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- forward(x)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- training: bool¶
- class easycv.models.backbones.genet.SuperResK1DW(in_channels=0, out_channels=0, kernel_size=3, stride=1, expansion=1.0, sublayers=1, no_create=False, block_name=None, **kwargs)[source]¶
Bases:
easycv.models.backbones.genet.PlainNetBasicBlockClass
- __init__(in_channels=0, out_channels=0, kernel_size=3, stride=1, expansion=1.0, sublayers=1, no_create=False, block_name=None, **kwargs)[source]¶
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- training: bool¶
- forward(x)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class easycv.models.backbones.genet.PlainNet(plainnet_struct_idx=None, num_classes=0, no_create=False, **kwargs)[source]¶
Bases:
torch.nn.modules.module.Module
- training: bool¶
- __init__(plainnet_struct_idx=None, num_classes=0, no_create=False, **kwargs)[source]¶
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- forward(x)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
easycv.models.backbones.hrnet module¶
- easycv.models.backbones.hrnet.get_expansion(block, expansion=None)[source]¶
Get the expansion of a residual block.
The block expansion will be obtained by the following order:
If
expansion
is given, just return it.If
block
has the attributeexpansion
, then returnblock.expansion
.Return the default value according the the block type: 1 for
BasicBlock
and 4 forBottleneck
.
- Parameters
block (class) – The block class.
expansion (int | None) – The given expansion ratio.
- Returns
The expansion of the block.
- Return type
int
- class easycv.models.backbones.hrnet.Bottleneck(in_channels, out_channels, expansion=4, stride=1, dilation=1, downsample=None, style='pytorch', with_cp=False, conv_cfg=None, norm_cfg={'type': 'BN'})[source]¶
Bases:
torch.nn.modules.module.Module
Bottleneck block for ResNet.
- Parameters
in_channels (int) – Input channels of this block.
out_channels (int) – Output channels of this block.
expansion (int) – The ratio of
out_channels/mid_channels
wheremid_channels
is the input/output channels of conv2. Default: 4.stride (int) – stride of the block. Default: 1
dilation (int) – dilation of convolution. Default: 1
downsample (nn.Module) – downsample operation on identity branch. Default: None.
style (str) –
"pytorch"
or"caffe"
. If set to “pytorch”, the stride-two layer is the 3x3 conv layer, otherwise the stride-two layer is the first 1x1 conv layer. Default: “pytorch”.with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed.
conv_cfg (dict) – dictionary to construct and config conv layer. Default: None
norm_cfg (dict) – dictionary to construct and config norm layer. Default: dict(type=’BN’)
- __init__(in_channels, out_channels, expansion=4, stride=1, dilation=1, downsample=None, style='pytorch', with_cp=False, conv_cfg=None, norm_cfg={'type': 'BN'})[source]¶
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- property norm1¶
the normalization layer named “norm1”
- Type
nn.Module
- property norm2¶
the normalization layer named “norm2”
- Type
nn.Module
- property norm3¶
the normalization layer named “norm3”
- Type
nn.Module
- training: bool¶
- class easycv.models.backbones.hrnet.HRModule(num_branches, blocks, num_blocks, in_channels, num_channels, multiscale_output=False, with_cp=False, conv_cfg=None, norm_cfg={'type': 'BN'}, upsample_cfg={'align_corners': None, 'mode': 'nearest'})[source]¶
Bases:
torch.nn.modules.module.Module
High-Resolution Module for HRNet.
In this module, every branch has 4 BasicBlocks/Bottlenecks. Fusion/Exchange is in this module.
- __init__(num_branches, blocks, num_blocks, in_channels, num_channels, multiscale_output=False, with_cp=False, conv_cfg=None, norm_cfg={'type': 'BN'}, upsample_cfg={'align_corners': None, 'mode': 'nearest'})[source]¶
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- training: bool¶
- class easycv.models.backbones.hrnet.HRNet(arch='w32', extra=None, in_channels=3, conv_cfg=None, norm_cfg={'type': 'BN'}, norm_eval=False, with_cp=False, zero_init_residual=False, multi_scale_output=False)[source]¶
Bases:
torch.nn.modules.module.Module
HRNet backbone.
High-Resolution Representations for Labeling Pixels and Regions
- Parameters
extra (dict) – detailed configuration for each stage of HRNet.
in_channels (int) – Number of input image channels. Default: 3.
conv_cfg (dict) – dictionary to construct and config conv layer.
norm_cfg (dict) – dictionary to construct and config norm layer.
norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False
with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed.
zero_init_residual (bool) – whether to use zero init for last norm layer in resblocks to let them behave as identity.
Example
>>> from mmpose.models import HRNet >>> import torch >>> extra = dict( >>> stage1=dict( >>> num_modules=1, >>> num_branches=1, >>> block='BOTTLENECK', >>> num_blocks=(4, ), >>> num_channels=(64, )), >>> stage2=dict( >>> num_modules=1, >>> num_branches=2, >>> block='BASIC', >>> num_blocks=(4, 4), >>> num_channels=(32, 64)), >>> stage3=dict( >>> num_modules=4, >>> num_branches=3, >>> block='BASIC', >>> num_blocks=(4, 4, 4), >>> num_channels=(32, 64, 128)), >>> stage4=dict( >>> num_modules=3, >>> num_branches=4, >>> block='BASIC', >>> num_blocks=(4, 4, 4, 4), >>> num_channels=(32, 64, 128, 256))) >>> self = HRNet(extra, in_channels=1) >>> self.eval() >>> inputs = torch.rand(1, 1, 32, 32) >>> level_outputs = self.forward(inputs) >>> for level_out in level_outputs: ... print(tuple(level_out.shape)) (1, 32, 8, 8)
- blocks_dict = {'BASIC': <class 'easycv.models.backbones.resnet.BasicBlock'>, 'BOTTLENECK': <class 'easycv.models.backbones.hrnet.Bottleneck'>}¶
- arch_zoo = {'w18': [[1, 1, 'BOTTLENECK', (4,), (64,)], [1, 2, 'BASIC', (4, 4), (18, 36)], [4, 3, 'BASIC', (4, 4, 4), (18, 36, 72)], [3, 4, 'BASIC', (4, 4, 4, 4), (18, 36, 72, 144)]], 'w30': [[1, 1, 'BOTTLENECK', (4,), (64,)], [1, 2, 'BASIC', (4, 4), (30, 60)], [4, 3, 'BASIC', (4, 4, 4), (30, 60, 120)], [3, 4, 'BASIC', (4, 4, 4, 4), (30, 60, 120, 240)]], 'w32': [[1, 1, 'BOTTLENECK', (4,), (64,)], [1, 2, 'BASIC', (4, 4), (32, 64)], [4, 3, 'BASIC', (4, 4, 4), (32, 64, 128)], [3, 4, 'BASIC', (4, 4, 4, 4), (32, 64, 128, 256)]], 'w40': [[1, 1, 'BOTTLENECK', (4,), (64,)], [1, 2, 'BASIC', (4, 4), (40, 80)], [4, 3, 'BASIC', (4, 4, 4), (40, 80, 160)], [3, 4, 'BASIC', (4, 4, 4, 4), (40, 80, 160, 320)]], 'w44': [[1, 1, 'BOTTLENECK', (4,), (64,)], [1, 2, 'BASIC', (4, 4), (44, 88)], [4, 3, 'BASIC', (4, 4, 4), (44, 88, 176)], [3, 4, 'BASIC', (4, 4, 4, 4), (44, 88, 176, 352)]], 'w48': [[1, 1, 'BOTTLENECK', (4,), (64,)], [1, 2, 'BASIC', (4, 4), (48, 96)], [4, 3, 'BASIC', (4, 4, 4), (48, 96, 192)], [3, 4, 'BASIC', (4, 4, 4, 4), (48, 96, 192, 384)]], 'w64': [[1, 1, 'BOTTLENECK', (4,), (64,)], [1, 2, 'BASIC', (4, 4), (64, 128)], [4, 3, 'BASIC', (4, 4, 4), (64, 128, 256)], [3, 4, 'BASIC', (4, 4, 4, 4), (64, 128, 256, 512)]]}¶
- __init__(arch='w32', extra=None, in_channels=3, conv_cfg=None, norm_cfg={'type': 'BN'}, norm_eval=False, with_cp=False, zero_init_residual=False, multi_scale_output=False)[source]¶
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- property norm1¶
the normalization layer named “norm1”
- Type
nn.Module
- property norm2¶
the normalization layer named “norm2”
- Type
nn.Module
- init_weights(pretrained=None)[source]¶
Initialize the weights in backbone.
- Parameters
pretrained (str, optional) – Path to pre-trained weights. Defaults to None.
- training: bool¶
easycv.models.backbones.inceptionv3 module¶
This model is taken from the official PyTorch model zoo. - torchvision.models.inception.py on 31th Aug, 2019
- class easycv.models.backbones.inceptionv3.Inception3(num_classes: int = 0, aux_logits: bool = True, transform_input: bool = False)[source]¶
Bases:
torch.nn.modules.module.Module
- __init__(num_classes: int = 0, aux_logits: bool = True, transform_input: bool = False) → None[source]¶
- Parameters
num_classes – number of classes based on dataset.
aux_logits – If True, adds two auxiliary branches that can improve training. Default: False when pretrained is True otherwise True
transform_input – If True, preprocesses the input according to the method with which it was trained on ImageNet. Default: False
- forward(x)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- training: bool¶
easycv.models.backbones.lighthrnet module¶
- easycv.models.backbones.lighthrnet.channel_shuffle(x, groups)[source]¶
Channel Shuffle operation.
This function enables cross-group information flow for multiple groups convolution layers.
- Parameters
x (Tensor) – The input tensor.
groups (int) – The number of groups to divide the input tensor in the channel dimension.
- Returns
The output tensor after channel shuffle operation.
- Return type
Tensor
- class easycv.models.backbones.lighthrnet.SpatialWeighting(channels, ratio=16, conv_cfg=None, norm_cfg=None, act_cfg=({'type': 'ReLU'}, {'type': 'Sigmoid'}))[source]¶
Bases:
torch.nn.modules.module.Module
Spatial weighting module.
- Parameters
channels (int) – The channels of the module.
ratio (int) – channel reduction ratio.
conv_cfg (dict) – Config dict for convolution layer. Default: None, which means using conv2d.
norm_cfg (dict) – Config dict for normalization layer. Default: None.
act_cfg (dict) – Config dict for activation layer. Default: (dict(type=’ReLU’), dict(type=’Sigmoid’)). The last ConvModule uses Sigmoid by default.
- __init__(channels, ratio=16, conv_cfg=None, norm_cfg=None, act_cfg=({'type': 'ReLU'}, {'type': 'Sigmoid'}))[source]¶
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- forward(x)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- training: bool¶
- class easycv.models.backbones.lighthrnet.CrossResolutionWeighting(channels, ratio=16, conv_cfg=None, norm_cfg=None, act_cfg=({'type': 'ReLU'}, {'type': 'Sigmoid'}))[source]¶
Bases:
torch.nn.modules.module.Module
Cross-resolution channel weighting module.
- Parameters
channels (int) – The channels of the module.
ratio (int) – channel reduction ratio.
conv_cfg (dict) – Config dict for convolution layer. Default: None, which means using conv2d.
norm_cfg (dict) – Config dict for normalization layer. Default: None.
act_cfg (dict) – Config dict for activation layer. Default: (dict(type=’ReLU’), dict(type=’Sigmoid’)). The last ConvModule uses Sigmoid by default.
- __init__(channels, ratio=16, conv_cfg=None, norm_cfg=None, act_cfg=({'type': 'ReLU'}, {'type': 'Sigmoid'}))[source]¶
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- forward(x)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- training: bool¶
- class easycv.models.backbones.lighthrnet.ConditionalChannelWeighting(in_channels, stride, reduce_ratio, conv_cfg=None, norm_cfg={'type': 'BN'}, with_cp=False)[source]¶
Bases:
torch.nn.modules.module.Module
Conditional channel weighting block.
- Parameters
in_channels (int) – The input channels of the block.
stride (int) – Stride of the 3x3 convolution layer.
reduce_ratio (int) – channel reduction ratio.
conv_cfg (dict) – Config dict for convolution layer. Default: None, which means using conv2d.
norm_cfg (dict) – Config dict for normalization layer. Default: dict(type=’BN’).
with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.
- __init__(in_channels, stride, reduce_ratio, conv_cfg=None, norm_cfg={'type': 'BN'}, with_cp=False)[source]¶
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- forward(x)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- training: bool¶
- class easycv.models.backbones.lighthrnet.Stem(in_channels, stem_channels, out_channels, expand_ratio, conv_cfg=None, norm_cfg={'type': 'BN'}, with_cp=False)[source]¶
Bases:
torch.nn.modules.module.Module
Stem network block.
- Parameters
in_channels (int) – The input channels of the block.
stem_channels (int) – Output channels of the stem layer.
out_channels (int) – The output channels of the block.
expand_ratio (int) – adjusts number of channels of the hidden layer in InvertedResidual by this amount.
conv_cfg (dict) – Config dict for convolution layer. Default: None, which means using conv2d.
norm_cfg (dict) – Config dict for normalization layer. Default: dict(type=’BN’).
with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.
- __init__(in_channels, stem_channels, out_channels, expand_ratio, conv_cfg=None, norm_cfg={'type': 'BN'}, with_cp=False)[source]¶
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- forward(x)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- training: bool¶
- class easycv.models.backbones.lighthrnet.IterativeHead(in_channels, norm_cfg={'type': 'BN'})[source]¶
Bases:
torch.nn.modules.module.Module
Extra iterative head for feature learning.
- Parameters
in_channels (int) – The input channels of the block.
norm_cfg (dict) – Config dict for normalization layer. Default: dict(type=’BN’).
- __init__(in_channels, norm_cfg={'type': 'BN'})[source]¶
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- forward(x)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- training: bool¶
- class easycv.models.backbones.lighthrnet.ShuffleUnit(in_channels, out_channels, stride=1, conv_cfg=None, norm_cfg={'type': 'BN'}, act_cfg={'type': 'ReLU'}, with_cp=False)[source]¶
Bases:
torch.nn.modules.module.Module
InvertedResidual block for ShuffleNetV2 backbone.
- Parameters
in_channels (int) – The input channels of the block.
out_channels (int) – The output channels of the block.
stride (int) – Stride of the 3x3 convolution layer. Default: 1
conv_cfg (dict) – Config dict for convolution layer. Default: None, which means using conv2d.
norm_cfg (dict) – Config dict for normalization layer. Default: dict(type=’BN’).
act_cfg (dict) – Config dict for activation layer. Default: dict(type=’ReLU’).
with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.
- __init__(in_channels, out_channels, stride=1, conv_cfg=None, norm_cfg={'type': 'BN'}, act_cfg={'type': 'ReLU'}, with_cp=False)[source]¶
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- forward(x)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- training: bool¶
- class easycv.models.backbones.lighthrnet.LiteHRModule(num_branches, num_blocks, in_channels, reduce_ratio, module_type, multiscale_output=False, with_fuse=True, conv_cfg=None, norm_cfg={'type': 'BN'}, with_cp=False)[source]¶
Bases:
torch.nn.modules.module.Module
High-Resolution Module for LiteHRNet.
It contains conditional channel weighting blocks and shuffle blocks.
- Parameters
num_branches (int) – Number of branches in the module.
num_blocks (int) – Number of blocks in the module.
in_channels (list(int)) – Number of input image channels.
reduce_ratio (int) – Channel reduction ratio.
module_type (str) – ‘LITE’ or ‘NAIVE’
multiscale_output (bool) – Whether to output multi-scale features.
with_fuse (bool) – Whether to use fuse layers.
conv_cfg (dict) – dictionary to construct and config conv layer.
norm_cfg (dict) – dictionary to construct and config norm layer.
with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed.
- __init__(num_branches, num_blocks, in_channels, reduce_ratio, module_type, multiscale_output=False, with_fuse=True, conv_cfg=None, norm_cfg={'type': 'BN'}, with_cp=False)[source]¶
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- training: bool¶
- class easycv.models.backbones.lighthrnet.LiteHRNet(extra, in_channels=3, conv_cfg=None, norm_cfg={'type': 'BN'}, norm_eval=False, with_cp=False)[source]¶
Bases:
torch.nn.modules.module.Module
Lite-HRNet backbone.
Lite-HRNet: A Lightweight High-Resolution Network
Code adapted from ‘https://github.com/HRNet/Lite-HRNet/’ ‘blob/hrnet/models/backbones/litehrnet.py’
- Parameters
extra (dict) – detailed configuration for each stage of HRNet.
in_channels (int) – Number of input image channels. Default: 3.
conv_cfg (dict) – dictionary to construct and config conv layer.
norm_cfg (dict) – dictionary to construct and config norm layer.
norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False
with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed.
Example
>>> from mmpose.models import LiteHRNet >>> import torch >>> extra=dict( >>> stem=dict(stem_channels=32, out_channels=32, expand_ratio=1), >>> num_stages=3, >>> stages_spec=dict( >>> num_modules=(2, 4, 2), >>> num_branches=(2, 3, 4), >>> num_blocks=(2, 2, 2), >>> module_type=('LITE', 'LITE', 'LITE'), >>> with_fuse=(True, True, True), >>> reduce_ratios=(8, 8, 8), >>> num_channels=( >>> (40, 80), >>> (40, 80, 160), >>> (40, 80, 160, 320), >>> )), >>> with_head=False) >>> self = LiteHRNet(extra, in_channels=1) >>> self.eval() >>> inputs = torch.rand(1, 1, 32, 32) >>> level_outputs = self.forward(inputs) >>> for level_out in level_outputs: ... print(tuple(level_out.shape)) (1, 40, 8, 8)
- __init__(extra, in_channels=3, conv_cfg=None, norm_cfg={'type': 'BN'}, norm_eval=False, with_cp=False)[source]¶
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- init_weights(pretrained=None)[source]¶
Initialize the weights in backbone.
- Parameters
pretrained (str, optional) – Path to pre-trained weights. Defaults to None.
- training: bool¶
easycv.models.backbones.mae_vit_transformer module¶
Mostly copy-paste from https://github.com/facebookresearch/mae/blob/main/models_mae.py
- class easycv.models.backbones.mae_vit_transformer.MaskedAutoencoderViT(img_size=224, patch_size=16, in_chans=3, embed_dim=1024, depth=24, num_heads=16, mlp_ratio=4.0, norm_layer=functools.partial(<class 'torch.nn.modules.normalization.LayerNorm'>, eps=1e-06))[source]¶
Bases:
torch.nn.modules.module.Module
- Masked Autoencoder with VisionTransformer backbone.
MaskedAutoencoderViT is mostly same as vit_tranformer_dynamic, but with a random_masking func. MaskedAutoencoderViT model can be loaded by vit_tranformer_dynamic.
- Parameters
img_size (int) – input image size
patch_size (int) – patch size
in_chans (int) – input image channels
embed_dim (int) – feature dimensions
depth (int) – number of encoder layers
num_heads (int) – Parallel attention heads
mlp_ratio (float) – mlp ratio
norm_layer – type of normalization layer
- __init__(img_size=224, patch_size=16, in_chans=3, embed_dim=1024, depth=24, num_heads=16, mlp_ratio=4.0, norm_layer=functools.partial(<class 'torch.nn.modules.normalization.LayerNorm'>, eps=1e-06))[source]¶
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- random_masking(x, mask_ratio)[source]¶
Perform per-sample random masking by per-sample shuffling. Per-sample shuffling is done by argsort random noise. x: [N, L, D], sequence
- forward(x, mask_ratio)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- training: bool¶
easycv.models.backbones.mnasnet module¶
This model is taken from the official PyTorch model zoo. - torchvision.models.mnasnet.py on 31th Aug, 2019
- class easycv.models.backbones.mnasnet.MNASNet(alpha, num_classes=0, dropout=0.2)[source]¶
Bases:
torch.nn.modules.module.Module
MNASNet, as described in https://arxiv.org/pdf/1807.11626.pdf. >>> model = MNASNet(1000, 1.0) >>> x = torch.rand(1, 3, 224, 224) >>> y = model(x) >>> y.dim() 1 >>> y.nelement() 1000
- __init__(alpha, num_classes=0, dropout=0.2)[source]¶
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- forward(x)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- training: bool¶
easycv.models.backbones.mobilenetv2 module¶
This model is taken from the official PyTorch model zoo. - torchvision.models.mobilenet.py on 31th Aug, 2019
- class easycv.models.backbones.mobilenetv2.MobileNetV2(num_classes=0, width_multi=1.0, inverted_residual_setting=None, round_nearest=8)[source]¶
Bases:
torch.nn.modules.module.Module
- __init__(num_classes=0, width_multi=1.0, inverted_residual_setting=None, round_nearest=8)[source]¶
MobileNet V2 main class :param num_classes: Number of classes :type num_classes: int :param width_multi: Width multiplier - adjusts number of channels in each layer by this amount :type width_multi: float :param inverted_residual_setting: Network structure :param round_nearest: Round the number of channels in each layer to be a multiple of this number :type round_nearest: int :param Set to 1 to turn off rounding:
- forward(x)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- training: bool¶
easycv.models.backbones.network_blocks module¶
- class easycv.models.backbones.network_blocks.SiLU(inplace=True)[source]¶
Bases:
torch.nn.modules.module.Module
export-friendly inplace version of nn.SiLU()
- __init__(inplace=True)[source]¶
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- static forward(x)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- training: bool¶
- class easycv.models.backbones.network_blocks.HSiLU(inplace=True)[source]¶
Bases:
torch.nn.modules.module.Module
export-friendly inplace version of nn.SiLU() hardsigmoid is better than sigmoid when used for edge model
- __init__(inplace=True)[source]¶
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- static forward(x)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- training: bool¶
- class easycv.models.backbones.network_blocks.BaseConv(in_channels, out_channels, ksize, stride, groups=1, bias=False, act='silu')[source]¶
Bases:
torch.nn.modules.module.Module
A Conv2d -> Batchnorm -> silu/leaky relu block
- __init__(in_channels, out_channels, ksize, stride, groups=1, bias=False, act='silu')[source]¶
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- forward(x)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- training: bool¶
- class easycv.models.backbones.network_blocks.DWConv(in_channels, out_channels, ksize, stride=1, act='silu')[source]¶
Bases:
torch.nn.modules.module.Module
Depthwise Conv + Conv
- __init__(in_channels, out_channels, ksize, stride=1, act='silu')[source]¶
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- forward(x)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- training: bool¶
- class easycv.models.backbones.network_blocks.Bottleneck(in_channels, out_channels, shortcut=True, expansion=0.5, depthwise=False, act='silu')[source]¶
Bases:
torch.nn.modules.module.Module
- __init__(in_channels, out_channels, shortcut=True, expansion=0.5, depthwise=False, act='silu')[source]¶
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- forward(x)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- training: bool¶
- class easycv.models.backbones.network_blocks.ResLayer(in_channels: int)[source]¶
Bases:
torch.nn.modules.module.Module
Residual layer with in_channels inputs.
- __init__(in_channels: int)[source]¶
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- forward(x)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- training: bool¶
- class easycv.models.backbones.network_blocks.SPPBottleneck(in_channels, out_channels, kernel_sizes=(5, 9, 13), activation='silu')[source]¶
Bases:
torch.nn.modules.module.Module
Spatial pyramid pooling layer used in YOLOv3-SPP
- __init__(in_channels, out_channels, kernel_sizes=(5, 9, 13), activation='silu')[source]¶
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- forward(x)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- training: bool¶
- class easycv.models.backbones.network_blocks.CSPLayer(in_channels, out_channels, n=1, shortcut=True, expansion=0.5, depthwise=False, act='silu')[source]¶
Bases:
torch.nn.modules.module.Module
CSP Bottleneck with 3 convolutions
- __init__(in_channels, out_channels, n=1, shortcut=True, expansion=0.5, depthwise=False, act='silu')[source]¶
- Parameters
in_channels (int) – input channels.
out_channels (int) – output channels.
n (int) – number of Bottlenecks. Default value: 1.
- forward(x)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- training: bool¶
- class easycv.models.backbones.network_blocks.Focus(in_channels, out_channels, ksize=1, stride=1, act='silu')[source]¶
Bases:
torch.nn.modules.module.Module
Focus width and height information into channel space.
- __init__(in_channels, out_channels, ksize=1, stride=1, act='silu')[source]¶
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- forward(x)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- training: bool¶
easycv.models.backbones.pytorch_image_models_wrapper module¶
- class easycv.models.backbones.pytorch_image_models_wrapper.PytorchImageModelWrapper(model_name='resnet50', pretrained=False, checkpoint_path=None, scriptable=None, exportable=None, no_jit=None, **kwargs)[source]¶
Bases:
torch.nn.modules.module.Module
Support Backbones From pytorch-image-models.
The PyTorch community has lots of awesome contributions for image models. PyTorch Image Models (timm) is a collection of image models, aim to pull together a wide variety of SOTA models with ability to reproduce ImageNet training results.
Model pages can be found at https://rwightman.github.io/pytorch-image-models/models/
References: https://github.com/rwightman/pytorch-image-models
- __init__(model_name='resnet50', pretrained=False, checkpoint_path=None, scriptable=None, exportable=None, no_jit=None, **kwargs)[source]¶
Inits PytorchImageModelWrapper by timm.create_models :param model_name: name of model to instantiate :type model_name: str :param pretrained: load pretrained ImageNet-1k weights if true :type pretrained: bool :param checkpoint_path: path of checkpoint to load after model is initialized :type checkpoint_path: str :param scriptable: set layer config so that model is jit scriptable (not working for all models yet) :type scriptable: bool :param exportable: set layer config so that model is traceable / ONNX exportable (not fully impl/obeyed yet) :type exportable: bool :param no_jit: set layer config so that model doesn’t utilize jit scripted layers (so far activations only) :type no_jit: bool
- training: bool¶
- forward(x)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
easycv.models.backbones.resnest module¶
ResNet variants
- class easycv.models.backbones.resnest.SplAtConv2d(in_channels, channels, kernel_size, stride=(1, 1), padding=(0, 0), dilation=(1, 1), groups=1, bias=True, radix=2, reduction_factor=4, rectify=False, rectify_avg=False, norm_layer=None, dropblock_prob=0.0, **kwargs)[source]¶
Bases:
torch.nn.modules.module.Module
Split-Attention Conv2d
- __init__(in_channels, channels, kernel_size, stride=(1, 1), padding=(0, 0), dilation=(1, 1), groups=1, bias=True, radix=2, reduction_factor=4, rectify=False, rectify_avg=False, norm_layer=None, dropblock_prob=0.0, **kwargs)[source]¶
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- forward(x)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- training: bool¶
- class easycv.models.backbones.resnest.rSoftMax(radix, cardinality)[source]¶
Bases:
torch.nn.modules.module.Module
- __init__(radix, cardinality)[source]¶
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- forward(x)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- training: bool¶
- class easycv.models.backbones.resnest.GlobalAvgPool2d[source]¶
Bases:
torch.nn.modules.module.Module
- forward(inputs)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- training: bool¶
- class easycv.models.backbones.resnest.Bottleneck(inplanes, planes, stride=1, downsample=None, radix=1, cardinality=1, bottleneck_width=64, avd=False, avd_first=False, dilation=1, is_first=False, rectified_conv=False, rectify_avg=False, norm_layer=None, dropblock_prob=0.0, last_gamma=False)[source]¶
Bases:
torch.nn.modules.module.Module
ResNet Bottleneck
- expansion = 4¶
- __init__(inplanes, planes, stride=1, downsample=None, radix=1, cardinality=1, bottleneck_width=64, avd=False, avd_first=False, dilation=1, is_first=False, rectified_conv=False, rectify_avg=False, norm_layer=None, dropblock_prob=0.0, last_gamma=False)[source]¶
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- forward(x)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- training: bool¶
- class easycv.models.backbones.resnest.ResNeSt(depth=None, block=<class 'easycv.models.backbones.resnest.Bottleneck'>, layers=[3, 4, 6, 3], radix=2, groups=1, bottleneck_width=64, num_classes=0, dilated=False, dilation=1, deep_stem=True, stem_width=32, avg_down=True, rectified_conv=False, rectify_avg=False, avd=False, avd_first=False, final_drop=0.0, dropblock_prob=0, last_gamma=False, norm_layer=<class 'torch.nn.modules.batchnorm.BatchNorm2d'>)[source]¶
Bases:
torch.nn.modules.module.Module
ResNet Variants
- Parameters
block (Block) – Class for the residual block. Options are BasicBlockV1, BottleneckV1.
layers (list of int) – Numbers of layers in each block
classes (int, default 1000) – Number of classification classes.
dilated (bool, default False) – Applying dilation strategy to pretrained ResNet yielding a stride-8 model, typically used in Semantic Segmentation.
norm_layer (object) – Normalization layer used in backbone network (default:
mxnet.gluon.nn.BatchNorm
; for Synchronized Cross-GPU BachNormalization).Reference –
He, Kaiming, et al. “Deep residual learning for image recognition.” Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.
Yu, Fisher, and Vladlen Koltun. “Multi-scale context aggregation by dilated convolutions.”
- arch_settings = {50: ((3, 4, 6, 3), 32), 101: ((3, 4, 23, 3), 64), 200: ((3, 24, 36, 3), 64), 269: ((3, 30, 48, 8), 64)}¶
- __init__(depth=None, block=<class 'easycv.models.backbones.resnest.Bottleneck'>, layers=[3, 4, 6, 3], radix=2, groups=1, bottleneck_width=64, num_classes=0, dilated=False, dilation=1, deep_stem=True, stem_width=32, avg_down=True, rectified_conv=False, rectify_avg=False, avd=False, avd_first=False, final_drop=0.0, dropblock_prob=0, last_gamma=False, norm_layer=<class 'torch.nn.modules.batchnorm.BatchNorm2d'>)[source]¶
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- training: bool¶
- forward(x)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
easycv.models.backbones.resnet module¶
- class easycv.models.backbones.resnet.BasicBlock(inplanes, planes, stride=1, dilation=1, downsample=None, style='pytorch', with_cp=False, conv_cfg=None, norm_cfg={'type': 'BN'}, frelu=False)[source]¶
Bases:
torch.nn.modules.module.Module
- expansion = 1¶
- __init__(inplanes, planes, stride=1, dilation=1, downsample=None, style='pytorch', with_cp=False, conv_cfg=None, norm_cfg={'type': 'BN'}, frelu=False)[source]¶
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- property norm1¶
- property norm2¶
- forward(x)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- training: bool¶
- class easycv.models.backbones.resnet.Bottleneck(inplanes, planes, stride=1, dilation=1, downsample=None, style='pytorch', with_cp=False, conv_cfg=None, norm_cfg={'type': 'BN'}, frelu=False)[source]¶
Bases:
torch.nn.modules.module.Module
- expansion = 4¶
- __init__(inplanes, planes, stride=1, dilation=1, downsample=None, style='pytorch', with_cp=False, conv_cfg=None, norm_cfg={'type': 'BN'}, frelu=False)[source]¶
Bottleneck block for ResNet. If style is “pytorch”, the stride-two layer is the 3x3 conv layer, if it is “caffe”, the stride-two layer is the first 1x1 conv layer.
- property norm1¶
- property norm2¶
- property norm3¶
- forward(x)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- training: bool¶
- easycv.models.backbones.resnet.make_res_layer(block, inplanes, planes, blocks, stride=1, dilation=1, style='pytorch', with_cp=False, conv_cfg=None, norm_cfg={'type': 'BN'}, frelu=False)[source]¶
- class easycv.models.backbones.resnet.ResNet(depth, in_channels=3, num_stages=4, strides=(1, 2, 2, 2), dilations=(1, 1, 1, 1), out_indices=(0, 1, 2, 3, 4), style='pytorch', num_classes=0, frozen_stages=- 1, conv_cfg=None, norm_cfg={'requires_grad': True, 'type': 'BN'}, norm_eval=False, with_cp=False, frelu=False, original_inplanes=64, zero_init_residual=False)[source]¶
Bases:
torch.nn.modules.module.Module
ResNet backbone.
- Parameters
depth (int) – Depth of resnet, from {18, 34, 50, 101, 152}.
in_channels (int) – Number of input image channels. Normally 3.
num_stages (int) – Resnet stages, normally 4.
strides (Sequence[int]) – Strides of the first block of each stage.
dilations (Sequence[int]) – Dilation of each stage.
out_indices (Sequence[int]) – Output from which stages.
style (str) – pytorch or caffe. If set to “pytorch”, the stride-two layer is the 3x3 conv layer, otherwise the stride-two layer is the first 1x1 conv layer.
frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). -1 means not freezing any parameters.
norm_cfg (dict) – dictionary to construct and config norm layer.
norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only.
with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed.
original_inplanes – start channel for first block, default=64
zero_init_residual (bool) – whether to use zero init for last norm layer in resblocks to let them behave as identity.
Example
>>> from easycv.models import ResNet >>> import torch >>> self = ResNet(depth=18) >>> self.eval() >>> inputs = torch.rand(1, 3, 32, 32) >>> level_outputs = self.forward(inputs) >>> for level_out in level_outputs: ... print(tuple(level_out.shape)) (1, 64, 8, 8) (1, 128, 4, 4) (1, 256, 2, 2) (1, 512, 1, 1)
- arch_settings = {10: (<class 'easycv.models.backbones.resnet.BasicBlock'>, (1, 1, 1, 1)), 18: (<class 'easycv.models.backbones.resnet.BasicBlock'>, (2, 2, 2, 2)), 34: (<class 'easycv.models.backbones.resnet.BasicBlock'>, (3, 4, 6, 3)), 50: (<class 'easycv.models.backbones.resnet.Bottleneck'>, (3, 4, 6, 3)), 101: (<class 'easycv.models.backbones.resnet.Bottleneck'>, (3, 4, 23, 3)), 152: (<class 'easycv.models.backbones.resnet.Bottleneck'>, (3, 8, 36, 3))}¶
- __init__(depth, in_channels=3, num_stages=4, strides=(1, 2, 2, 2), dilations=(1, 1, 1, 1), out_indices=(0, 1, 2, 3, 4), style='pytorch', num_classes=0, frozen_stages=- 1, conv_cfg=None, norm_cfg={'requires_grad': True, 'type': 'BN'}, norm_eval=False, with_cp=False, frelu=False, original_inplanes=64, zero_init_residual=False)[source]¶
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- property norm1¶
- training: bool¶
- forward(x)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- train(mode=True)[source]¶
Sets the module in training mode.
This has any effect only on certain modules. See documentations of particular modules for details of their behaviors in training/evaluation mode, if they are affected, e.g.
Dropout
,BatchNorm
, etc.- Parameters
mode (bool) – whether to set training mode (
True
) or evaluation mode (False
). Default:True
.- Returns
self
- Return type
Module
easycv.models.backbones.resnet_jit module¶
- class easycv.models.backbones.resnet_jit.BasicBlock(inplanes, planes, stride=1, dilation=1, downsample=None, style='pytorch', with_cp=False, conv_cfg=None, norm_cfg={'type': 'BN'})[source]¶
Bases:
torch.nn.modules.module.Module
- expansion = 1¶
- __init__(inplanes, planes, stride=1, dilation=1, downsample=None, style='pytorch', with_cp=False, conv_cfg=None, norm_cfg={'type': 'BN'})[source]¶
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- property norm1¶
- property norm2¶
- forward(x)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- training: bool¶
- class easycv.models.backbones.resnet_jit.Bottleneck(inplanes, planes, stride=1, dilation=1, downsample=None, style='pytorch', with_cp=False, conv_cfg=None, norm_cfg={'type': 'BN'})[source]¶
Bases:
torch.nn.modules.module.Module
- expansion = 4¶
- __init__(inplanes, planes, stride=1, dilation=1, downsample=None, style='pytorch', with_cp=False, conv_cfg=None, norm_cfg={'type': 'BN'})[source]¶
Bottleneck block for ResNet. If style is “pytorch”, the stride-two layer is the 3x3 conv layer, if it is “caffe”, the stride-two layer is the first 1x1 conv layer.
- property norm1¶
- property norm2¶
- property norm3¶
- forward(x)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- training: bool¶
- easycv.models.backbones.resnet_jit.make_res_layer(block, inplanes, planes, blocks, stride=1, dilation=1, style='pytorch', with_cp=False, conv_cfg=None, norm_cfg={'type': 'BN'})[source]¶
- class easycv.models.backbones.resnet_jit.ResNetJIT(depth, in_channels=3, num_stages=4, strides=(1, 2, 2, 2), dilations=(1, 1, 1, 1), out_indices=(0, 1, 2, 3, 4), style='pytorch', frozen_stages=- 1, conv_cfg=None, norm_cfg={'requires_grad': True, 'type': 'BN'}, norm_eval=False, with_cp=False, zero_init_residual=False)[source]¶
Bases:
torch.nn.modules.module.Module
ResNet backbone.
- Parameters
depth (int) – Depth of resnet, from {18, 34, 50, 101, 152}.
in_channels (int) – Number of input image channels. Normally 3.
num_stages (int) – Resnet stages, normally 4.
strides (Sequence[int]) – Strides of the first block of each stage.
dilations (Sequence[int]) – Dilation of each stage.
out_indices (Sequence[int]) – Output from which stages.
style (str) – pytorch or caffe. If set to “pytorch”, the stride-two layer is the 3x3 conv layer, otherwise the stride-two layer is the first 1x1 conv layer.
frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). -1 means not freezing any parameters.
norm_cfg (dict) – dictionary to construct and config norm layer.
norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only.
with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed.
zero_init_residual (bool) – whether to use zero init for last norm layer in resblocks to let them behave as identity.
Example
>>> from easycv.models import ResNet >>> import torch >>> self = ResNet(depth=18) >>> self.eval() >>> inputs = torch.rand(1, 3, 32, 32) >>> level_outputs = self.forward(inputs) >>> for level_out in level_outputs: ... print(tuple(level_out.shape)) (1, 64, 8, 8) (1, 128, 4, 4) (1, 256, 2, 2) (1, 512, 1, 1)
- arch_settings = {18: (<class 'easycv.models.backbones.resnet_jit.BasicBlock'>, (2, 2, 2, 2)), 34: (<class 'easycv.models.backbones.resnet_jit.BasicBlock'>, (3, 4, 6, 3)), 50: (<class 'easycv.models.backbones.resnet_jit.Bottleneck'>, (3, 4, 6, 3)), 101: (<class 'easycv.models.backbones.resnet_jit.Bottleneck'>, (3, 4, 23, 3)), 152: (<class 'easycv.models.backbones.resnet_jit.Bottleneck'>, (3, 8, 36, 3))}¶
- __init__(depth, in_channels=3, num_stages=4, strides=(1, 2, 2, 2), dilations=(1, 1, 1, 1), out_indices=(0, 1, 2, 3, 4), style='pytorch', frozen_stages=- 1, conv_cfg=None, norm_cfg={'requires_grad': True, 'type': 'BN'}, norm_eval=False, with_cp=False, zero_init_residual=False)[source]¶
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- property norm1¶
- training: bool¶
- forward(x: torch.Tensor) → List[torch.Tensor][source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- train(mode=True)[source]¶
Sets the module in training mode.
This has any effect only on certain modules. See documentations of particular modules for details of their behaviors in training/evaluation mode, if they are affected, e.g.
Dropout
,BatchNorm
, etc.- Parameters
mode (bool) – whether to set training mode (
True
) or evaluation mode (False
). Default:True
.- Returns
self
- Return type
Module
easycv.models.backbones.resnext module¶
- class easycv.models.backbones.resnext.Bottleneck(inplanes, planes, groups=1, base_width=4, **kwargs)[source]¶
Bases:
easycv.models.backbones.resnet.Bottleneck
- __init__(inplanes, planes, groups=1, base_width=4, **kwargs)[source]¶
Bottleneck block for ResNeXt. If style is “pytorch”, the stride-two layer is the 3x3 conv layer, if it is “caffe”, the stride-two layer is the first 1x1 conv layer.
- training: bool¶
- easycv.models.backbones.resnext.make_res_layer(block, inplanes, planes, blocks, stride=1, dilation=1, groups=1, base_width=4, style='pytorch', with_cp=False, conv_cfg=None, norm_cfg={'type': 'BN'})[source]¶
- class easycv.models.backbones.resnext.ResNeXt(groups=1, base_width=4, **kwargs)[source]¶
Bases:
easycv.models.backbones.resnet.ResNet
ResNeXt backbone.
- Parameters
depth (int) – Depth of resnet, from {18, 34, 50, 101, 152}.
in_channels (int) – Number of input image channels. Normally 3.
num_stages (int) – Resnet stages, normally 4.
groups (int) – Group of resnext.
base_width (int) – Base width of resnext.
strides (Sequence[int]) – Strides of the first block of each stage.
dilations (Sequence[int]) – Dilation of each stage.
out_indices (Sequence[int]) – Output from which stages.
style (str) – pytorch or caffe. If set to “pytorch”, the stride-two layer is the 3x3 conv layer, otherwise the stride-two layer is the first 1x1 conv layer.
frozen_stages (int) – Stages to be frozen (all param fixed). -1 means not freezing any parameters.
norm_cfg (dict) – dictionary to construct and config norm layer.
norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only.
with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed.
zero_init_residual (bool) – whether to use zero init for last norm layer in resblocks to let them behave as identity.
Example
>>> from easycv.models import ResNeXt >>> import torch >>> self = ResNeXt(depth=50) >>> self.eval() >>> inputs = torch.rand(1, 3, 32, 32) >>> level_outputs = self.forward(inputs) >>> for level_out in level_outputs: ... print(tuple(level_out.shape)) (1, 256, 8, 8) (1, 512, 4, 4) (1, 1024, 2, 2) (1, 2048, 1, 1)
- arch_settings = {50: (<class 'easycv.models.backbones.resnext.Bottleneck'>, (3, 4, 6, 3)), 101: (<class 'easycv.models.backbones.resnext.Bottleneck'>, (3, 4, 23, 3)), 152: (<class 'easycv.models.backbones.resnext.Bottleneck'>, (3, 8, 36, 3))}¶
- __init__(groups=1, base_width=4, **kwargs)[source]¶
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- training: bool¶
easycv.models.backbones.shuffle_transformer module¶
- class easycv.models.backbones.shuffle_transformer.Mlp(in_features, hidden_features=None, out_features=None, act_layer=<class 'torch.nn.modules.activation.ReLU6'>, drop=0.0, stride=False)[source]¶
Bases:
torch.nn.modules.module.Module
- __init__(in_features, hidden_features=None, out_features=None, act_layer=<class 'torch.nn.modules.activation.ReLU6'>, drop=0.0, stride=False)[source]¶
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- forward(x)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- training: bool¶
- class easycv.models.backbones.shuffle_transformer.Attention(dim, num_heads, window_size=1, shuffle=False, qkv_bias=False, qk_scale=None, attn_drop=0.0, proj_drop=0.0, relative_pos_embedding=False)[source]¶
Bases:
torch.nn.modules.module.Module
- __init__(dim, num_heads, window_size=1, shuffle=False, qkv_bias=False, qk_scale=None, attn_drop=0.0, proj_drop=0.0, relative_pos_embedding=False)[source]¶
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- forward(x)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- training: bool¶
- class easycv.models.backbones.shuffle_transformer.Block(dim, out_dim, num_heads, window_size=1, shuffle=False, mlp_ratio=4.0, qkv_bias=False, qk_scale=None, drop=0.0, attn_drop=0.0, drop_path=0.0, act_layer=<class 'torch.nn.modules.activation.ReLU6'>, norm_layer=<class 'torch.nn.modules.batchnorm.BatchNorm2d'>, stride=False, relative_pos_embedding=False)[source]¶
Bases:
torch.nn.modules.module.Module
- __init__(dim, out_dim, num_heads, window_size=1, shuffle=False, mlp_ratio=4.0, qkv_bias=False, qk_scale=None, drop=0.0, attn_drop=0.0, drop_path=0.0, act_layer=<class 'torch.nn.modules.activation.ReLU6'>, norm_layer=<class 'torch.nn.modules.batchnorm.BatchNorm2d'>, stride=False, relative_pos_embedding=False)[source]¶
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- forward(x)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- training: bool¶
- class easycv.models.backbones.shuffle_transformer.PatchMerging(dim, out_dim, norm_layer=<class 'torch.nn.modules.batchnorm.BatchNorm2d'>)[source]¶
Bases:
torch.nn.modules.module.Module
- __init__(dim, out_dim, norm_layer=<class 'torch.nn.modules.batchnorm.BatchNorm2d'>)[source]¶
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- forward(x)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- extra_repr() → str[source]¶
Set the extra representation of the module
To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.
- training: bool¶
- class easycv.models.backbones.shuffle_transformer.StageModule(layers, dim, out_dim, num_heads, window_size=1, shuffle=True, mlp_ratio=4.0, qkv_bias=False, qk_scale=None, drop=0.0, attn_drop=0.0, drop_path=0.0, act_layer=<class 'torch.nn.modules.activation.ReLU6'>, norm_layer=<class 'torch.nn.modules.batchnorm.BatchNorm2d'>, relative_pos_embedding=False)[source]¶
Bases:
torch.nn.modules.module.Module
- __init__(layers, dim, out_dim, num_heads, window_size=1, shuffle=True, mlp_ratio=4.0, qkv_bias=False, qk_scale=None, drop=0.0, attn_drop=0.0, drop_path=0.0, act_layer=<class 'torch.nn.modules.activation.ReLU6'>, norm_layer=<class 'torch.nn.modules.batchnorm.BatchNorm2d'>, relative_pos_embedding=False)[source]¶
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- forward(x)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- training: bool¶
- class easycv.models.backbones.shuffle_transformer.PatchEmbedding(inter_channel=32, out_channels=48)[source]¶
Bases:
torch.nn.modules.module.Module
- __init__(inter_channel=32, out_channels=48)[source]¶
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- forward(x)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- training: bool¶
- class easycv.models.backbones.shuffle_transformer.ShuffleTransformer(img_size=224, in_chans=3, num_classes=1000, token_dim=32, embed_dim=96, mlp_ratio=4.0, layers=[2, 2, 6, 2], num_heads=[3, 6, 12, 24], relative_pos_embedding=True, shuffle=True, window_size=7, qkv_bias=True, qk_scale=None, drop_rate=0.0, attn_drop_rate=0.0, drop_path_rate=0.0, has_pos_embed=False, **kwargs)[source]¶
Bases:
torch.nn.modules.module.Module
- __init__(img_size=224, in_chans=3, num_classes=1000, token_dim=32, embed_dim=96, mlp_ratio=4.0, layers=[2, 2, 6, 2], num_heads=[3, 6, 12, 24], relative_pos_embedding=True, shuffle=True, window_size=7, qkv_bias=True, qk_scale=None, drop_rate=0.0, attn_drop_rate=0.0, drop_path_rate=0.0, has_pos_embed=False, **kwargs)[source]¶
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- forward(x)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- training: bool¶
- easycv.models.backbones.shuffle_transformer.shuffletrans_base_p4_w7_224(pretrained=False, **kwargs)[source]¶
easycv.models.backbones.swin_transformer_dynamic module¶
Borrow this code from https://github.com/microsoft/esvit/blob/main/models/swin_transformer.py To support dynamic swin-transformer for ssl!
- class easycv.models.backbones.swin_transformer_dynamic.Mlp(in_features, hidden_features=None, out_features=None, act_layer=<class 'torch.nn.modules.activation.GELU'>, drop=0.0)[source]¶
Bases:
torch.nn.modules.module.Module
- __init__(in_features, hidden_features=None, out_features=None, act_layer=<class 'torch.nn.modules.activation.GELU'>, drop=0.0)[source]¶
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- forward(x)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- training: bool¶
- easycv.models.backbones.swin_transformer_dynamic.window_partition(x, window_size)[source]¶
- Parameters
x – (B, H, W, C)
window_size (int) – window size
- Returns
(num_windows*B, window_size, window_size, C)
- Return type
windows
- easycv.models.backbones.swin_transformer_dynamic.window_reverse(windows, window_size, H, W)[source]¶
- Parameters
windows – (num_windows*B, window_size, window_size, C)
window_size (int) – Window size
H (int) – Height of image
W (int) – Width of image
- Returns
(B, H, W, C)
- Return type
x
- class easycv.models.backbones.swin_transformer_dynamic.WindowAttention(dim, window_size, num_heads, qkv_bias=True, qk_scale=None, attn_drop=0.0, proj_drop=0.0)[source]¶
Bases:
torch.nn.modules.module.Module
Window based multi-head self attention (W-MSA) module with relative position bias. It supports both of shifted and non-shifted window.
- Parameters
dim (int) – Number of input channels.
window_size (tuple[int]) – The height and width of the window.
num_heads (int) – Number of attention heads.
qkv_bias (bool, optional) – If True, add a learnable bias to query, key, value. Default: True
qk_scale (float | None, optional) – Override default qk scale of head_dim ** -0.5 if set
attn_drop (float, optional) – Dropout ratio of attention weight. Default: 0.0
proj_drop (float, optional) – Dropout ratio of output. Default: 0.0
- __init__(dim, window_size, num_heads, qkv_bias=True, qk_scale=None, attn_drop=0.0, proj_drop=0.0)[source]¶
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- forward(x, mask=None)[source]¶
- Parameters
x – input features with shape of (num_windows*B, N, C)
mask – (0/-inf) mask with shape of (num_windows, Wh*Ww, Wh*Ww) or None
- extra_repr() → str[source]¶
Set the extra representation of the module
To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.
- training: bool¶
- class easycv.models.backbones.swin_transformer_dynamic.SwinTransformerBlock(dim, input_resolution, num_heads, window_size=7, shift_size=0, mlp_ratio=4.0, qkv_bias=True, qk_scale=None, drop=0.0, attn_drop=0.0, drop_path=0.0, act_layer=<class 'torch.nn.modules.activation.GELU'>, norm_layer=<class 'torch.nn.modules.normalization.LayerNorm'>)[source]¶
Bases:
torch.nn.modules.module.Module
Swin Transformer Block.
- Parameters
dim (int) – Number of input channels.
input_resolution (tuple[int]) – Input resulotion.
num_heads (int) – Number of attention heads.
window_size (int) – Window size.
shift_size (int) – Shift size for SW-MSA.
mlp_ratio (float) – Ratio of mlp hidden dim to embedding dim.
qkv_bias (bool, optional) – If True, add a learnable bias to query, key, value. Default: True
qk_scale (float | None, optional) – Override default qk scale of head_dim ** -0.5 if set.
drop (float, optional) – Dropout rate. Default: 0.0
attn_drop (float, optional) – Attention dropout rate. Default: 0.0
drop_path (float, optional) – Stochastic depth rate. Default: 0.0
act_layer (nn.Module, optional) – Activation layer. Default: nn.GELU
norm_layer (nn.Module, optional) – Normalization layer. Default: nn.LayerNorm
- __init__(dim, input_resolution, num_heads, window_size=7, shift_size=0, mlp_ratio=4.0, qkv_bias=True, qk_scale=None, drop=0.0, attn_drop=0.0, drop_path=0.0, act_layer=<class 'torch.nn.modules.activation.GELU'>, norm_layer=<class 'torch.nn.modules.normalization.LayerNorm'>)[source]¶
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- forward(x)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- extra_repr() → str[source]¶
Set the extra representation of the module
To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.
- training: bool¶
- class easycv.models.backbones.swin_transformer_dynamic.PatchMerging(input_resolution, dim, norm_layer=<class 'torch.nn.modules.normalization.LayerNorm'>)[source]¶
Bases:
torch.nn.modules.module.Module
Patch Merging Layer.
- Parameters
input_resolution (tuple[int]) – Resolution of input feature.
dim (int) – Number of input channels.
norm_layer (nn.Module, optional) – Normalization layer. Default: nn.LayerNorm
- __init__(input_resolution, dim, norm_layer=<class 'torch.nn.modules.normalization.LayerNorm'>)[source]¶
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- forward(x)[source]¶
Forward function. :param x: Input feature, tensor size (B, H*W, C). :param H: Spatial resolution of the input feature. :param W: Spatial resolution of the input feature.
- extra_repr() → str[source]¶
Set the extra representation of the module
To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.
- training: bool¶
- class easycv.models.backbones.swin_transformer_dynamic.BasicLayer(dim, input_resolution, depth, num_heads, window_size, mlp_ratio=4.0, qkv_bias=True, qk_scale=None, drop=0.0, attn_drop=0.0, drop_path=0.0, norm_layer=<class 'torch.nn.modules.normalization.LayerNorm'>, downsample=None)[source]¶
Bases:
torch.nn.modules.module.Module
A basic Swin Transformer layer for one stage.
- Parameters
dim (int) – Number of input channels.
input_resolution (tuple[int]) – Input resulotion.
depth (int) – Number of blocks.
num_heads (int) – Number of attention heads.
window_size (int) – Window size.
mlp_ratio (float) – Ratio of mlp hidden dim to embedding dim.
qkv_bias (bool, optional) – If True, add a learnable bias to query, key, value. Default: True
qk_scale (float | None, optional) – Override default qk scale of head_dim ** -0.5 if set.
drop (float, optional) – Dropout rate. Default: 0.0
attn_drop (float, optional) – Attention dropout rate. Default: 0.0
drop_path (float | tuple[float], optional) – Stochastic depth rate. Default: 0.0
norm_layer (nn.Module, optional) – Normalization layer. Default: nn.LayerNorm
downsample (nn.Module | None, optional) – Downsample layer at the end of the layer. Default: None
- __init__(dim, input_resolution, depth, num_heads, window_size, mlp_ratio=4.0, qkv_bias=True, qk_scale=None, drop=0.0, attn_drop=0.0, drop_path=0.0, norm_layer=<class 'torch.nn.modules.normalization.LayerNorm'>, downsample=None)[source]¶
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- forward(x)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- extra_repr() → str[source]¶
Set the extra representation of the module
To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.
- training: bool¶
- class easycv.models.backbones.swin_transformer_dynamic.PatchEmbed(img_size=224, patch_size=16, in_chans=3, embed_dim=768, norm_layer=None)[source]¶
Bases:
torch.nn.modules.module.Module
Image to Patch Embedding
- __init__(img_size=224, patch_size=16, in_chans=3, embed_dim=768, norm_layer=None)[source]¶
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- forward(x)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- training: bool¶
- class easycv.models.backbones.swin_transformer_dynamic.SwinTransformer(img_size=224, patch_size=4, in_chans=3, num_classes=1000, embed_dim=96, depths=[2, 2, 6, 2], num_heads=[3, 6, 12, 24], window_size=7, mlp_ratio=4.0, qkv_bias=True, qk_scale=None, drop_rate=0.0, attn_drop_rate=0.0, drop_path_rate=0.1, norm_layer=<class 'torch.nn.modules.normalization.LayerNorm'>, ape=False, patch_norm=True, use_dense_prediction=False, **kwargs)[source]¶
Bases:
torch.nn.modules.module.Module
- Swin Transformer
- A PyTorch impl ofSwin Transformer: Hierarchical Vision Transformer using Shifted Windows -
- Parameters
img_size (int | tuple(int)) – Input image size.
patch_size (int | tuple(int)) – Patch size.
in_chans (int) – Number of input channels.
num_classes (int) – Number of classes for classification head.
embed_dim (int) – Embedding dimension.
depths (tuple(int)) – Depth of Swin Transformer layers.
num_heads (tuple(int)) – Number of attention heads in different layers.
window_size (int) – Window size.
mlp_ratio (float) – Ratio of mlp hidden dim to embedding dim.
qkv_bias (bool) – If True, add a learnable bias to query, key, value. Default: Truee
qk_scale (float) – Override default qk scale of head_dim ** -0.5 if set.
drop_rate (float) – Dropout rate.
attn_drop_rate (float) – Attention dropout rate.
drop_path_rate (float) – Stochastic depth rate.
norm_layer (nn.Module) – normalization layer.
ape (bool) – If True, add absolute position embedding to the patch embedding.
patch_norm (bool) – If True, add normalization after patch embedding.
- __init__(img_size=224, patch_size=4, in_chans=3, num_classes=1000, embed_dim=96, depths=[2, 2, 6, 2], num_heads=[3, 6, 12, 24], window_size=7, mlp_ratio=4.0, qkv_bias=True, qk_scale=None, drop_rate=0.0, attn_drop_rate=0.0, drop_path_rate=0.1, norm_layer=<class 'torch.nn.modules.normalization.LayerNorm'>, ape=False, patch_norm=True, use_dense_prediction=False, **kwargs)[source]¶
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- forward(x)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- training: bool¶
- easycv.models.backbones.swin_transformer_dynamic.dynamic_swin_tiny_p4_w7_224(pretrained=False, **kwargs)[source]¶
easycv.models.backbones.vit_transfomer_dynamic module¶
Mostly copy-paste from timm library. https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/vision_transformer.py
dynamic Input support borrow from https://github.com/microsoft/esvit/blob/main/models/vision_transformer.py
- easycv.models.backbones.vit_transfomer_dynamic.drop_path(x, drop_prob: float = 0.0, training: bool = False)[source]¶
- class easycv.models.backbones.vit_transfomer_dynamic.DropPath(drop_prob=None)[source]¶
Bases:
torch.nn.modules.module.Module
Drop paths (Stochastic Depth) per sample (when applied in main path of residual blocks).
- __init__(drop_prob=None)[source]¶
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- forward(x)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- training: bool¶
- class easycv.models.backbones.vit_transfomer_dynamic.Mlp(in_features, hidden_features=None, out_features=None, act_layer=<class 'torch.nn.modules.activation.GELU'>, drop=0.0)[source]¶
Bases:
torch.nn.modules.module.Module
- __init__(in_features, hidden_features=None, out_features=None, act_layer=<class 'torch.nn.modules.activation.GELU'>, drop=0.0)[source]¶
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- forward(x)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- training: bool¶
- class easycv.models.backbones.vit_transfomer_dynamic.Attention(dim, num_heads=8, qkv_bias=False, qk_scale=None, attn_drop=0.0, proj_drop=0.0)[source]¶
Bases:
torch.nn.modules.module.Module
- __init__(dim, num_heads=8, qkv_bias=False, qk_scale=None, attn_drop=0.0, proj_drop=0.0)[source]¶
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- forward(x)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- training: bool¶
- class easycv.models.backbones.vit_transfomer_dynamic.Block(dim, num_heads, mlp_ratio=4.0, qkv_bias=False, qk_scale=None, drop=0.0, attn_drop=0.0, drop_path=0.0, act_layer=<class 'torch.nn.modules.activation.GELU'>, norm_layer=<class 'torch.nn.modules.normalization.LayerNorm'>)[source]¶
Bases:
torch.nn.modules.module.Module
- __init__(dim, num_heads, mlp_ratio=4.0, qkv_bias=False, qk_scale=None, drop=0.0, attn_drop=0.0, drop_path=0.0, act_layer=<class 'torch.nn.modules.activation.GELU'>, norm_layer=<class 'torch.nn.modules.normalization.LayerNorm'>)[source]¶
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- forward(x, return_attention=False)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- training: bool¶
- class easycv.models.backbones.vit_transfomer_dynamic.PatchEmbed(img_size=224, patch_size=16, in_chans=3, embed_dim=768)[source]¶
Bases:
torch.nn.modules.module.Module
Image to Patch Embedding
- __init__(img_size=224, patch_size=16, in_chans=3, embed_dim=768)[source]¶
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- forward(x)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- training: bool¶
- class easycv.models.backbones.vit_transfomer_dynamic.VisionTransformer(img_size=[224], patch_size=16, in_chans=3, num_classes=0, embed_dim=768, depth=12, num_heads=12, mlp_ratio=4.0, qkv_bias=False, qk_scale=None, drop_rate=0.0, attn_drop_rate=0.0, drop_path_rate=0.0, norm_layer=<class 'torch.nn.modules.normalization.LayerNorm'>, use_dense_prediction=False, global_pool=False, **kwargs)[source]¶
Bases:
torch.nn.modules.module.Module
Vision Transformer
- __init__(img_size=[224], patch_size=16, in_chans=3, num_classes=0, embed_dim=768, depth=12, num_heads=12, mlp_ratio=4.0, qkv_bias=False, qk_scale=None, drop_rate=0.0, attn_drop_rate=0.0, drop_path_rate=0.0, norm_layer=<class 'torch.nn.modules.normalization.LayerNorm'>, use_dense_prediction=False, global_pool=False, **kwargs)[source]¶
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- forward(x)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- training: bool¶
- easycv.models.backbones.vit_transfomer_dynamic.dynamic_deit_tiny_p16(patch_size=16, **kwargs)[source]¶
- easycv.models.backbones.vit_transfomer_dynamic.dynamic_deit_small_p16(patch_size=16, **kwargs)[source]¶
- easycv.models.backbones.vit_transfomer_dynamic.dynamic_vit_base_p16(patch_size=16, **kwargs)[source]¶
easycv.models.backbones.xcit_transformer module¶
Implementation of Cross-Covariance Image Transformer (XCiT) Based on timm and DeiT code bases https://github.com/rwightman/pytorch-image-models/tree/master/timm https://github.com/facebookresearch/deit/
XCiT Transformer. Part of the code is borrowed from: https://github.com/facebookresearch/xcit/blob/master/xcit.py
- class easycv.models.backbones.xcit_transformer.PositionalEncodingFourier(hidden_dim=32, dim=768, temperature=10000)[source]¶
Bases:
torch.nn.modules.module.Module
Positional encoding relying on a fourier kernel matching the one used in the “Attention is all of Need” paper. The implementation builds on DeTR code https://github.com/facebookresearch/detr/blob/master/models/position_encoding.py
- __init__(hidden_dim=32, dim=768, temperature=10000)[source]¶
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- forward(B, H, W)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- training: bool¶
- easycv.models.backbones.xcit_transformer.conv3x3(in_planes, out_planes, stride=1)[source]¶
3x3 convolution with padding
- class easycv.models.backbones.xcit_transformer.ConvPatchEmbed(img_size=224, patch_size=16, in_chans=3, embed_dim=768)[source]¶
Bases:
torch.nn.modules.module.Module
Image to Patch Embedding using multiple convolutional layers
- __init__(img_size=224, patch_size=16, in_chans=3, embed_dim=768)[source]¶
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- forward(x, padding_size=None)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- training: bool¶
- class easycv.models.backbones.xcit_transformer.LPI(in_features, hidden_features=None, out_features=None, act_layer=<class 'torch.nn.modules.activation.GELU'>, drop=0.0, kernel_size=3)[source]¶
Bases:
torch.nn.modules.module.Module
Local Patch Interaction module that allows explicit communication between tokens in 3x3 windows to augment the implicit communcation performed by the block diagonal scatter attention. Implemented using 2 layers of separable 3x3 convolutions with GeLU and BatchNorm2d
- __init__(in_features, hidden_features=None, out_features=None, act_layer=<class 'torch.nn.modules.activation.GELU'>, drop=0.0, kernel_size=3)[source]¶
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- forward(x, H, W)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- training: bool¶
- class easycv.models.backbones.xcit_transformer.ClassAttention(dim, num_heads=8, qkv_bias=False, qk_scale=None, attn_drop=0.0, proj_drop=0.0)[source]¶
Bases:
torch.nn.modules.module.Module
Class Attention Layer as in CaiT https://arxiv.org/abs/2103.17239
- __init__(dim, num_heads=8, qkv_bias=False, qk_scale=None, attn_drop=0.0, proj_drop=0.0)[source]¶
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- forward(x)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- training: bool¶
- class easycv.models.backbones.xcit_transformer.ClassAttentionBlock(dim, num_heads, mlp_ratio=4.0, qkv_bias=False, qk_scale=None, drop=0.0, attn_drop=0.0, drop_path=0.0, act_layer=<class 'torch.nn.modules.activation.GELU'>, norm_layer=<class 'torch.nn.modules.normalization.LayerNorm'>, eta=None, tokens_norm=False)[source]¶
Bases:
torch.nn.modules.module.Module
Class Attention Layer as in CaiT https://arxiv.org/abs/2103.17239
- __init__(dim, num_heads, mlp_ratio=4.0, qkv_bias=False, qk_scale=None, drop=0.0, attn_drop=0.0, drop_path=0.0, act_layer=<class 'torch.nn.modules.activation.GELU'>, norm_layer=<class 'torch.nn.modules.normalization.LayerNorm'>, eta=None, tokens_norm=False)[source]¶
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- forward(x, H, W, mask=None)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- training: bool¶
- class easycv.models.backbones.xcit_transformer.XCA(dim, num_heads=8, qkv_bias=False, qk_scale=None, attn_drop=0.0, proj_drop=0.0)[source]¶
Bases:
torch.nn.modules.module.Module
Cross-Covariance Attention (XCA) operation where the channels are updated using a weighted sum.
The weights are obtained from the (softmax normalized) Cross-covariance matrix (Q^T K in d_h times d_h)
- __init__(dim, num_heads=8, qkv_bias=False, qk_scale=None, attn_drop=0.0, proj_drop=0.0)[source]¶
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- forward(x)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- training: bool¶
- class easycv.models.backbones.xcit_transformer.XCABlock(dim, num_heads, mlp_ratio=4.0, qkv_bias=False, qk_scale=None, drop=0.0, attn_drop=0.0, drop_path=0.0, act_layer=<class 'torch.nn.modules.activation.GELU'>, norm_layer=<class 'torch.nn.modules.normalization.LayerNorm'>, num_tokens=196, eta=None)[source]¶
Bases:
torch.nn.modules.module.Module
- __init__(dim, num_heads, mlp_ratio=4.0, qkv_bias=False, qk_scale=None, drop=0.0, attn_drop=0.0, drop_path=0.0, act_layer=<class 'torch.nn.modules.activation.GELU'>, norm_layer=<class 'torch.nn.modules.normalization.LayerNorm'>, num_tokens=196, eta=None)[source]¶
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- forward(x, H, W)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- training: bool¶
- class easycv.models.backbones.xcit_transformer.XCiT(img_size=224, patch_size=16, in_chans=3, num_classes=1000, embed_dim=768, depth=12, num_heads=12, mlp_ratio=4.0, qkv_bias=True, qk_scale=None, drop_rate=0.0, attn_drop_rate=0.0, drop_path_rate=0.0, norm_layer=None, cls_attn_layers=2, use_pos=True, patch_proj='linear', eta=None, tokens_norm=False)[source]¶
Bases:
torch.nn.modules.module.Module
Based on timm and DeiT code bases https://github.com/rwightman/pytorch-image-models/tree/master/timm https://github.com/facebookresearch/deit/
- __init__(img_size=224, patch_size=16, in_chans=3, num_classes=1000, embed_dim=768, depth=12, num_heads=12, mlp_ratio=4.0, qkv_bias=True, qk_scale=None, drop_rate=0.0, attn_drop_rate=0.0, drop_path_rate=0.0, norm_layer=None, cls_attn_layers=2, use_pos=True, patch_proj='linear', eta=None, tokens_norm=False)[source]¶
- Parameters
img_size (int, tuple) – input image size
patch_size (int, tuple) – patch size
in_chans (int) – number of input channels
num_classes (int) – number of classes for classification head
embed_dim (int) – embedding dimension
depth (int) – depth of transformer
num_heads (int) – number of attention heads
mlp_ratio (int) – ratio of mlp hidden dim to embedding dim
qkv_bias (bool) – enable bias for qkv if True
qk_scale (float) – override default qk scale of head_dim ** -0.5 if set
drop_rate (float) – dropout rate
attn_drop_rate (float) – attention dropout rate
drop_path_rate (float) – stochastic depth rate
norm_layer – (nn.Module): normalization layer
cls_attn_layers – (int) Depth of Class attention layers
use_pos – (bool) whether to use positional encoding
eta – (float) layerscale initialization value
tokens_norm – (bool) Whether to normalize all tokens or just the cls_token in the CA
- forward(x)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- training: bool¶
easycv.models.classification package¶
Submodules¶
easycv.models.classification.classification module¶
- easycv.models.classification.classification.distill_loss(cls_score, teacher_score, tempreature=1.0)[source]¶
Soft cross entropy loss
- class easycv.models.classification.classification.Classification(backbone, train_preprocess=[], with_sobel=False, head=None, neck=None, teacher=None, pretrained=None, mixup_cfg=None)[source]¶
Bases:
easycv.models.base.BaseModel
- __init__(backbone, train_preprocess=[], with_sobel=False, head=None, neck=None, teacher=None, pretrained=None, mixup_cfg=None)[source]¶
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- forward_backbone(img: torch.Tensor) → List[torch.Tensor][source]¶
Forward backbone
- Returns
backbone outputs
- Return type
x (tuple)
- forward_train(img, gt_labels) → Dict[str, torch.Tensor][source]¶
In forward train, model will forward backbone + neck / multi-neck, get alist of output tensor, and put this list to head / multi-head, to compute each loss
- forward_test(img: torch.Tensor) → Dict[str, torch.Tensor][source]¶
forward_test means generate prob/class from image only support one neck + one head
- forward_test_label(img, gt_labels) → Dict[str, torch.Tensor][source]¶
forward_test_label means generate prob/class from image only support one neck + one head ps : head init need set the input feature idx
- training: bool¶
- forward_feature(img) → Dict[str, torch.Tensor][source]¶
- Forward feature means forward backbone + neck/multineck ,get dict of output feature,
self.neck_num = 0: means only forward backbone, output backbone feature with avgpool, with key neck, self.neck_num > 0: means has 1/multi neck, output neck’s feature with key neck_neckidx_featureidx, suck as neck_0_0
- Returns
feature tensor
- Return type
x (torch.Tensor)
- forward(img: torch.Tensor, mode: str = 'train', gt_labels: Optional[torch.Tensor] = None, img_metas: Optional[torch.Tensor] = None) → Dict[str, torch.Tensor][source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
easycv.models.classification.necks module¶
- class easycv.models.classification.necks.LinearNeck(in_channels, out_channels, with_avg_pool=True, with_norm=False)[source]¶
Bases:
torch.nn.modules.module.Module
Linear neck: fc only
- __init__(in_channels, out_channels, with_avg_pool=True, with_norm=False)[source]¶
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- forward(x)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- training: bool¶
- class easycv.models.classification.necks.RetrivalNeck(in_channels, out_channels, with_avg_pool=True, cdg_config=['G', 'M'])[source]¶
Bases:
torch.nn.modules.module.Module
- RetrivalNeck: refer, Combination of Multiple Global Descriptors for Image Retrieval
CGD feature : only use avg pool + gem pooling + max pooling, by pool -> fc -> norm -> concat -> norm Avg feature : use avg pooling, avg pool -> syncbn -> fc
len(cgd_config) > 0: return [CGD, Avg] len(cgd_config) = 0 : return [Avg]
- __init__(in_channels, out_channels, with_avg_pool=True, cdg_config=['G', 'M'])[source]¶
Init RetrivalNeck, faceid neck doesn’t pool for input feature map, doesn’t support dynamic input
- Parameters
in_channels – Int - input feature map channels
out_channels – Int - output feature map channels
with_avg_pool – bool do avg pool for BNneck or not
cdg_config – list(‘G’,’M’,’S’), to configure output feature, CGD = [gempooling] + [maxpooling] + [meanpooling], if len(cgd_config) > 0: return [CGD, Avg] if len(cgd_config) = 0 : return [Avg]
- forward(x)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- training: bool¶
- class easycv.models.classification.necks.FaceIDNeck(in_channels, out_channels, map_shape=1, dropout_ratio=0.4, with_norm=False, bn_type='SyncBN')[source]¶
Bases:
torch.nn.modules.module.Module
FaceID neck: Include BN, dropout, flatten, linear, bn
- __init__(in_channels, out_channels, map_shape=1, dropout_ratio=0.4, with_norm=False, bn_type='SyncBN')[source]¶
Init FaceIDNeck, faceid neck doesn’t pool for input feature map, doesn’t support dynamic input
- Parameters
in_channels – Int - input feature map channels
out_channels – Int - output feature map channels
map_shape – Int or list(int,…), input feature map (w,h) or w when w=h,
dropout_ratio – float, drop out ratio
with_norm – normalize output feature or not
bn_type – SyncBN or BN
- forward(x)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- training: bool¶
- class easycv.models.classification.necks.MultiLinearNeck(in_channels, out_channels, num_layers=1, with_avg_pool=True)[source]¶
Bases:
torch.nn.modules.module.Module
MultiLinearNeck neck: MultiFc head
- __init__(in_channels, out_channels, num_layers=1, with_avg_pool=True)[source]¶
- Parameters
in_channels – int or list[int]
out_channels – int or list[int]
num_layers – total fc num
with_avg_pool – input will be avgPool if True
- Returns
None
- Raises
len(in_channel) != len(out_channels) –
len(in_channel) != len(num_layers) –
- forward(x)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- training: bool¶
- class easycv.models.classification.necks.HRFuseScales(in_channels, out_channels=2048, norm_cfg={'momentum': 0.1, 'type': 'BN'})[source]¶
Bases:
torch.nn.modules.module.Module
Fuse feature map of multiple scales in HRNet. :param in_channels: The input channels of all scales. :type in_channels: list[int] :param out_channels: The channels of fused feature map.
Defaults to 2048.
- Parameters
norm_cfg (dict) – dictionary to construct norm layers. Defaults to
dict(type='BN', momentum=0.1)
.init_cfg (dict | list[dict], optional) – Initialization config dict. Defaults to
dict(type='Normal', layer='Linear', std=0.01))
.
- __init__(in_channels, out_channels=2048, norm_cfg={'momentum': 0.1, 'type': 'BN'})[source]¶
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- training: bool¶
- forward(x)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
easycv.models.detection package¶
Subpackages¶
easycv.models.detection.yolox package¶
- class easycv.models.detection.yolox.yolo_head.YOLOXHead(num_classes, width=1.0, strides=[8, 16, 32], in_channels=[256, 512, 1024], act='silu', depthwise=False, stage='CLOUD')[source]¶
Bases:
torch.nn.modules.module.Module
- __init__(num_classes, width=1.0, strides=[8, 16, 32], in_channels=[256, 512, 1024], act='silu', depthwise=False, stage='CLOUD')[source]¶
- Parameters
num_classes (int) – detection class numbers.
width (float) – model width. Default value: 1.0.
strides (list) – expanded strides. Default value: [8, 16, 32].
in_channels (list) – model conv channels set. Default value: [256, 512, 1024].
act (str) – activation type of conv. Defalut value: “silu”.
depthwise (bool) – whether apply depthwise conv in conv branch. Default value: False.
stage (str) – model stage, distinguish edge head to cloud head. Default value: CLOUD.
- forward(xin, labels=None, imgs=None)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- get_losses(imgs, x_shifts, y_shifts, expanded_strides, labels, outputs, origin_preds, dtype)[source]¶
- get_assignments(batch_idx, num_gt, total_num_anchors, gt_bboxes_per_image, gt_classes, bboxes_preds_per_image, expanded_strides, x_shifts, y_shifts, cls_preds, bbox_preds, obj_preds, labels, imgs, mode='gpu')[source]¶
- get_in_boxes_info(gt_bboxes_per_image, expanded_strides, x_shifts, y_shifts, total_num_anchors, num_gt)[source]¶
- training: bool¶
- class easycv.models.detection.yolox.yolo_pafpn.YOLOPAFPN(depth=1.0, width=1.0, in_features=('dark3', 'dark4', 'dark5'), in_channels=[256, 512, 1024], depthwise=False, act='silu')[source]¶
Bases:
torch.nn.modules.module.Module
YOLOv3 model. Darknet 53 is the default backbone of this model.
- __init__(depth=1.0, width=1.0, in_features=('dark3', 'dark4', 'dark5'), in_channels=[256, 512, 1024], depthwise=False, act='silu')[source]¶
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- forward(input)[source]¶
- Parameters
inputs – input images.
- Returns
FPN feature.
- Return type
Tuple[Tensor]
- training: bool¶
- class easycv.models.detection.yolox.yolox.YOLOX(model_type: str = 's', num_classes: int = 80, test_size: tuple = (640, 640), test_conf: float = 0.01, nms_thre: float = 0.65, pretrained: Optional[str] = None)[source]¶
Bases:
easycv.models.base.BaseModel
YOLOX model module. The module list is defined by create_yolov3_modules function. The network returns loss values from three YOLO layers during training and detection results during test.
- param_map = {'l': [1.0, 1.0], 'm': [0.67, 0.75], 'nano': [0.33, 0.25], 's': [0.33, 0.5], 'tiny': [0.33, 0.375], 'x': [1.33, 1.25]}¶
- __init__(model_type: str = 's', num_classes: int = 80, test_size: tuple = (640, 640), test_conf: float = 0.01, nms_thre: float = 0.65, pretrained: Optional[str] = None)[source]¶
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- forward_train(img: torch.Tensor, gt_bboxes: torch.Tensor, gt_labels: torch.Tensor, img_metas=None, scale=None) → Dict[str, torch.Tensor][source]¶
Abstract interface for model forward in training
- Parameters
img (Tensor) – image tensor, NxCxHxW
target (List[Tensor]) – list of target tensor, NTx5 [class,x_c,y_c,w,h]
- forward_test(img: torch.Tensor, img_metas=None) → torch.Tensor[source]¶
Abstract interface for model forward in training
- Parameters
img (Tensor) – image tensor, NxCxHxW
target (List[Tensor]) – list of target tensor, NTx5 [class,x_c,y_c,w,h]
- forward(img, mode='compression', **kwargs)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- training: bool¶
easycv.models.detection.yolox_edge package¶
- class easycv.models.detection.yolox_edge.yolox_edge.YOLOX_EDGE(stage: str = 'EDGE', model_type: str = 's', num_classes: int = 80, test_size: tuple = (640, 640), test_conf: float = 0.01, nms_thre: float = 0.65, pretrained: Optional[str] = None, depth: float = 1.0, width: float = 1.0, max_model_params: float = 0.0, max_model_flops: float = 0.0, activation: str = 'silu', in_channels: list = [256, 512, 1024], backbone=None, head=None)[source]¶
Bases:
easycv.models.detection.yolox.yolox.YOLOX
YOLOX model module. The module list is defined by create_yolov3_modules function. The network returns loss values from three YOLO layers during training and detection results during test.
- __init__(stage: str = 'EDGE', model_type: str = 's', num_classes: int = 80, test_size: tuple = (640, 640), test_conf: float = 0.01, nms_thre: float = 0.65, pretrained: Optional[str] = None, depth: float = 1.0, width: float = 1.0, max_model_params: float = 0.0, max_model_flops: float = 0.0, activation: str = 'silu', in_channels: list = [256, 512, 1024], backbone=None, head=None)[source]¶
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- training: bool¶
easycv.models.heads package¶
Submodules¶
easycv.models.heads.cls_head module¶
- class easycv.models.heads.cls_head.ClsHead(with_avg_pool=False, label_smooth=0.0, in_channels=2048, with_fc=True, num_classes=1000, loss_config={'type': 'CrossEntropyLossWithLabelSmooth'}, input_feature_index=[0])[source]¶
Bases:
torch.nn.modules.module.Module
Simplest classifier head, with only one fc layer. Should Notice Evtorch module design input always be feature_list = [tensor, tensor,…]
- __init__(with_avg_pool=False, label_smooth=0.0, in_channels=2048, with_fc=True, num_classes=1000, loss_config={'type': 'CrossEntropyLossWithLabelSmooth'}, input_feature_index=[0])[source]¶
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- forward(x: List[torch.Tensor]) → List[torch.Tensor][source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- loss(cls_score: List[torch.Tensor], labels: torch.Tensor) → Dict[str, torch.Tensor][source]¶
- Parameters
cls_score – [N x num_classes]
labels – if don’t use mixup, shape is [N],else [N x num_classes]
- training: bool¶
easycv.models.heads.contrastive_head module¶
- class easycv.models.heads.contrastive_head.ContrastiveHead(temperature=0.1)[source]¶
Bases:
torch.nn.modules.module.Module
Head for contrastive learning.
- __init__(temperature=0.1)[source]¶
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- forward(pos, neg)[source]¶
- Parameters
pos (Tensor) – Nx1 positive similarity
neg (Tensor) – Nxk negative similarity
- training: bool¶
- class easycv.models.heads.contrastive_head.DebiasedContrastiveHead(temperature=0.1, tau=0.1)[source]¶
Bases:
torch.nn.modules.module.Module
- __init__(temperature=0.1, tau=0.1)[source]¶
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- forward(pos, neg)[source]¶
- Parameters
pos (Tensor) – Nx1 positive similarity
neg (Tensor) – Nxk negative similarity
- training: bool¶
easycv.models.heads.latent_pred_head module¶
- class easycv.models.heads.latent_pred_head.LatentPredictHead(predictor, size_average=True)[source]¶
Bases:
torch.nn.modules.module.Module
Head for contrastive learning.
- __init__(predictor, size_average=True)[source]¶
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- forward(input, target)[source]¶
- Parameters
input (Tensor) – NxC input features.
target (Tensor) – NxC target features.
- training: bool¶
- class easycv.models.heads.latent_pred_head.LatentClsHead(predictor)[source]¶
Bases:
torch.nn.modules.module.Module
Head for contrastive learning.
- __init__(predictor)[source]¶
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- forward(input, target)[source]¶
- Parameters
input (Tensor) – NxC input features.
target (Tensor) – NxC target features.
- training: bool¶
easycv.models.heads.mp_metric_head module¶
- easycv.models.heads.mp_metric_head.EmbeddingExplansion(embs, labels, explanion_rate=4, alpha=1.0)[source]¶
Expand embedding: CVPR refer to https://github.com/clovaai/embedding-expansion combine PK sampled data, mixup anchor positive pair to generate more features, always combine with BatchHardminer. result on SOP and CUB need to be add
- Parameters
embs – [N , dims] tensor
labels – [N] tensor
explanion_rate – to expand N to explanion_rate * N
alpha – beta distribution parameter for mixup
- Returns
[N*explanion_rate , dims]
- Return type
embs
- class easycv.models.heads.mp_metric_head.MpMetrixHead(with_avg_pool=False, in_channels=2048, loss_config=[{'type': 'CircleLoss', 'loss_weight': 1.0, 'norm': True, 'ddp': True, 'm': 0.4, 'gamma': 80}], input_feature_index=[0], input_label_index=0, ignore_label=None)[source]¶
Bases:
torch.nn.modules.module.Module
Simplest classifier head, with only one fc layer.
- __init__(with_avg_pool=False, in_channels=2048, loss_config=[{'type': 'CircleLoss', 'loss_weight': 1.0, 'norm': True, 'ddp': True, 'm': 0.4, 'gamma': 80}], input_feature_index=[0], input_label_index=0, ignore_label=None)[source]¶
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- forward(x: List[torch.Tensor]) → List[torch.Tensor][source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- training: bool¶
easycv.models.heads.multi_cls_head module¶
- class easycv.models.heads.multi_cls_head.MultiClsHead(pool_type='adaptive', in_indices=(0), with_last_layer_unpool=False, backbone='resnet50', norm_cfg={'type': 'BN'}, num_classes=1000)[source]¶
Bases:
torch.nn.modules.module.Module
Multiple classifier heads.
- FEAT_CHANNELS = {'resnet50': [64, 256, 512, 1024, 2048]}¶
- FEAT_LAST_UNPOOL = {'resnet50': 100352}¶
- __init__(pool_type='adaptive', in_indices=(0), with_last_layer_unpool=False, backbone='resnet50', norm_cfg={'type': 'BN'}, num_classes=1000)[source]¶
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- forward(x)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- training: bool¶
easycv.models.loss package¶
Submodules¶
easycv.models.loss.iou_loss module¶
- class easycv.models.loss.iou_loss.IOUloss(reduction='none', loss_type='iou')[source]¶
Bases:
torch.nn.modules.module.Module
- __init__(reduction='none', loss_type='iou')[source]¶
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- forward(pred, target)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- training: bool¶
easycv.models.loss.mse_loss module¶
- class easycv.models.loss.mse_loss.JointsMSELoss(use_target_weight=False, loss_weight=1.0)[source]¶
Bases:
torch.nn.modules.module.Module
MSE loss for heatmaps.
- Parameters
use_target_weight (bool) – Option to use weighted MSE loss. Different joint types may have different target weights.
loss_weight (float) – Weight of the loss. Default: 1.0.
- __init__(use_target_weight=False, loss_weight=1.0)[source]¶
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- training: bool¶
easycv.models.loss.pytorch_metric_learning module¶
- class easycv.models.loss.pytorch_metric_learning.FocalLoss2d(gamma=2, weight=None, size_average=None, reduce=None, reduction='mean', num_classes=2)[source]¶
Bases:
torch.nn.modules.loss._WeightedLoss
- __init__(gamma=2, weight=None, size_average=None, reduce=None, reduction='mean', num_classes=2)[source]¶
FocalLoss2d, loss solve 2-class classification unbalance problem
- Parameters
gamma – focal loss param Gamma
weight – weight same as loss._WeightedLoss
size_average – size_average same as loss._WeightedLoss
reduce – reduce same as loss._WeightedLoss
reduction – reduce same as loss._WeightedLoss
num_classes – fix num 2
- Returns
Focalloss nn.module.loss object
- reduction: str¶
- class easycv.models.loss.pytorch_metric_learning.DistributeMSELoss[source]¶
Bases:
torch.nn.modules.module.Module
- forward(input, target)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- training: bool¶
- class easycv.models.loss.pytorch_metric_learning.CrossEntropyLossWithLabelSmooth(label_smooth=0.1, temperature=1.0, with_cls=False, embedding_size=512, num_classes=10000)[source]¶
Bases:
torch.nn.modules.module.Module
- __init__(label_smooth=0.1, temperature=1.0, with_cls=False, embedding_size=512, num_classes=10000)[source]¶
A softmax loss , with label_smooth and fc(to fit pytorch metric learning interface) :param label_smooth: label_smooth args, default=0.1 :param with_cls: if True, will generate a nn.Linear to trans input embedding from embedding_size to num_classes :param embedding_size: if input is feature not logits, then need this to indicate embedding shape :param num_classes: if input is feature not logits, then need this to indicate classification num_classes
- Returns
None
- Raises
IOError – An error occurred accessing the bigtable.Table object.
- forward(input, target)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- training: bool¶
- class easycv.models.loss.pytorch_metric_learning.AMSoftmaxLoss(embedding_size=512, num_classes=100000, margin=0.35, scale=30)[source]¶
Bases:
torch.nn.modules.module.Module
- __init__(embedding_size=512, num_classes=100000, margin=0.35, scale=30)[source]¶
AMsoftmax loss , with fc(to fit pytorch metric learning interface), paper: https://arxiv.org/pdf/1801.05599.pdf :param embedding_size: forward input [N, embedding_size ] :param num_classes: classification num_classes :param margin: AMSoftmax param :param scale: AMSoftmax param, should increase num_classes
- forward(x, lb)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- training: bool¶
- class easycv.models.loss.pytorch_metric_learning.ModelParallelSoftmaxLoss(embedding_size=512, num_classes=100000, scale=None, margin=None, bias=True)[source]¶
Bases:
torch.nn.modules.module.Module
- __init__(embedding_size=512, num_classes=100000, scale=None, margin=None, bias=True)[source]¶
ModelParallel Softmax by sailfish :param embedding_size: forward input [N, embedding_size ] :param num_classes: classification num_classes
- forward(x, lb)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- training: bool¶
- class easycv.models.loss.pytorch_metric_learning.ModelParallelAMSoftmaxLoss(embedding_size=512, num_classes=100000, margin=0.35, scale=30)[source]¶
Bases:
torch.nn.modules.module.Module
- __init__(embedding_size=512, num_classes=100000, margin=0.35, scale=30)[source]¶
ModelParallel AMSoftmax by sailfish :param embedding_size: forward input [N, embedding_size ] :param num_classes: classification num_classes
- forward(x, lb)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- training: bool¶
- class easycv.models.loss.pytorch_metric_learning.SoftTargetCrossEntropy(num_classes=1000, **kwargs)[source]¶
Bases:
torch.nn.modules.module.Module
- __init__(num_classes=1000, **kwargs)[source]¶
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- forward(x: torch.Tensor, target: torch.Tensor) → torch.Tensor[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- training: bool¶
easycv.models.pose package¶
Subpackages¶
easycv.models.pose.heads package¶
- class easycv.models.pose.heads.topdown_heatmap_base_head.TopdownHeatmapBaseHead[source]¶
Bases:
torch.nn.modules.module.Module
Base class for top-down heatmap heads.
All top-down heatmap heads should subclass it. All subclass should overwrite:
Methods:get_loss, supporting to calculate loss. Methods:get_accuracy, supporting to calculate accuracy. Methods:forward, supporting to forward model. Methods:inference_model, supporting to inference model.
- decode(img_metas, output, **kwargs)[source]¶
Decode keypoints from heatmaps.
- Parameters
img_metas (list(dict)) – Information about data augmentation By default this includes: - “image_file: path to the image file - “center”: center of the bbox - “scale”: scale of the bbox - “rotation”: rotation of the bbox - “bbox_score”: score of bbox
output (np.ndarray[N, K, H, W]) – model predicted heatmaps.
- training: bool¶
- class easycv.models.pose.heads.topdown_heatmap_simple_head.TopdownHeatmapSimpleHead(in_channels, out_channels, num_deconv_layers=3, num_deconv_filters=(256, 256, 256), num_deconv_kernels=(4, 4, 4), extra=None, in_index=0, input_transform=None, align_corners=False, loss_keypoint=None, train_cfg=None, test_cfg=None)[source]¶
Bases:
easycv.models.pose.heads.topdown_heatmap_base_head.TopdownHeatmapBaseHead
Top-down heatmap simple head. paper ref: Bin Xiao et al.
Simple Baselines for Human Pose Estimation and Tracking
.TopdownHeatmapSimpleHead is consisted of (>=0) number of deconv layers and a simple conv2d layer.
- Parameters
in_channels (int) – Number of input channels
out_channels (int) – Number of output channels
num_deconv_layers (int) – Number of deconv layers. num_deconv_layers should >= 0. Note that 0 means no deconv layers.
num_deconv_filters (list|tuple) – Number of filters. If num_deconv_layers > 0, the length of
num_deconv_kernels (list|tuple) – Kernel sizes.
in_index (int|Sequence[int]) – Input feature index. Default: 0
input_transform (str|None) –
Transformation type of input features. Options: ‘resize_concat’, ‘multiple_select’, None. Default: None.
- ’resize_concat’: Multiple feature maps will be resized to the
same size as the first one and then concat together. Usually used in FCN head of HRNet.
- ’multiple_select’: Multiple feature maps will be bundle into
a list and passed into decode head.
None: Only one select feature map is allowed.
align_corners (bool) – align_corners argument of F.interpolate. Default: False.
loss_keypoint (dict) – Config for keypoint loss. Default: None.
- __init__(in_channels, out_channels, num_deconv_layers=3, num_deconv_filters=(256, 256, 256), num_deconv_kernels=(4, 4, 4), extra=None, in_index=0, input_transform=None, align_corners=False, loss_keypoint=None, train_cfg=None, test_cfg=None)[source]¶
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- get_loss(output, target, target_weight)[source]¶
Calculate top-down keypoint loss.
Note
batch_size: N num_keypoints: K heatmaps height: H heatmaps weight: W
- Parameters
output (torch.Tensor[NxKxHxW]) – Output heatmaps.
target (torch.Tensor[NxKxHxW]) – Target heatmaps.
target_weight (torch.Tensor[NxKx1]) – Weights across different joint types.
- get_accuracy(output, target, target_weight)[source]¶
Calculate accuracy for top-down keypoint loss.
Note
batch_size: N num_keypoints: K heatmaps height: H heatmaps weight: W
- Parameters
output (torch.Tensor[NxKxHxW]) – Output heatmaps.
target (torch.Tensor[NxKxHxW]) – Target heatmaps.
target_weight (torch.Tensor[NxKx1]) – Weights across different joint types.
- inference_model(x, flip_pairs=None)[source]¶
Inference function.
- Returns
Output heatmaps.
- Return type
output_heatmap (np.ndarray)
- Parameters
x (torch.Tensor[NxKxHxW]) – Input features.
flip_pairs (None | list[tuple()) – Pairs of keypoints which are mirrored.
- training: bool¶
Submodules¶
easycv.models.pose.top_down module¶
- class easycv.models.pose.top_down.TopDown(backbone, neck=None, keypoint_head=None, train_cfg=None, test_cfg=None, pretrained=None, loss_pose=None)[source]¶
Bases:
easycv.models.base.BaseModel
Top-down pose detectors.
- Parameters
backbone (dict) – Backbone modules to extract feature.
keypoint_head (dict) – Keypoint head to process feature.
train_cfg (dict) – Config for training. Default: None.
test_cfg (dict) – Config for testing. Default: None.
pretrained (str) – Path to the pretrained models.
loss_pose (None) – Deprecated arguments. Please use loss_keypoint for heads instead.
- __init__(backbone, neck=None, keypoint_head=None, train_cfg=None, test_cfg=None, pretrained=None, loss_pose=None)[source]¶
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- property with_neck¶
Check if has keypoint_head.
- property with_keypoint¶
Check if has keypoint_head.
- forward_train(img, target, target_weight, img_metas, **kwargs)[source]¶
Defines the computation performed at every call when training.
- forward_test(img, img_metas, return_heatmap=False, **kwargs)[source]¶
Defines the computation performed at every call when testing.
- show_result(img, result, skeleton=None, kpt_score_thr=0.3, bbox_color='green', pose_kpt_color=None, pose_link_color=None, text_color='white', radius=4, thickness=1, font_scale=0.5, bbox_thickness=1, win_name='', show=False, show_keypoint_weight=False, wait_time=0, out_file=None)[source]¶
Draw result over img.
- Parameters
img (str or Tensor) – The image to be displayed.
result (list[dict]) – The results to draw over img (bbox_result, pose_result).
skeleton (list[list]) – The connection of keypoints. skeleton is 0-based indexing.
kpt_score_thr (float, optional) – Minimum score of keypoints to be shown. Default: 0.3.
bbox_color (str or tuple or
Color
) – Color of bbox lines.pose_kpt_color (np.array[Nx3]`) – Color of N keypoints. If None, do not draw keypoints.
pose_link_color (np.array[Mx3]) – Color of M links. If None, do not draw links.
text_color (str or tuple or
Color
) – Color of texts.radius (int) – Radius of circles.
thickness (int) – Thickness of lines.
font_scale (float) – Font scales of texts.
win_name (str) – The window name.
show (bool) – Whether to show the image. Default: False.
show_keypoint_weight (bool) – Whether to change the transparency using the predicted confidence scores of keypoints.
wait_time (int) – Value of waitKey param. Default: 0.
out_file (str or None) – The filename to write the image. Default: None.
- Returns
Visualized img, only if not show or out_file.
- Return type
Tensor
- training: bool¶
easycv.models.selfsup package¶
Submodules¶
easycv.models.selfsup.byol module¶
- class easycv.models.selfsup.byol.BYOL(backbone, neck=None, head=None, pretrained=None, base_momentum=0.996, **kwargs)[source]¶
Bases:
easycv.models.base.BaseModel
BYOL unofficial implementation. Paper: https://arxiv.org/abs/2006.07733
- __init__(backbone, neck=None, head=None, pretrained=None, base_momentum=0.996, **kwargs)[source]¶
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- forward_train(img, **kwargs)[source]¶
Abstract interface for model forward in training
- Parameters
img (Tensor) – image tensor
kwargs (keyword arguments) – Specific to concrete implementation.
- forward_test(img, **kwargs)[source]¶
Abstract interface for model forward in testing
- Parameters
img (Tensor) – image tensor
kwargs (keyword arguments) – Specific to concrete implementation.
- forward(img, mode='train', **kwargs)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- training: bool¶
easycv.models.selfsup.dino module¶
- class easycv.models.selfsup.dino.MultiCropWrapper(backbone, head)[source]¶
Bases:
torch.nn.modules.module.Module
Perform forward pass separately on each resolution input. The inputs corresponding to a single resolution are clubbed and single forward is run on the same resolution inputs. Hence we do several forward passes = number of different resolutions used. We then concatenate all the output features and run the head forward on these concatenated features.
- __init__(backbone, head)[source]¶
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- forward(x)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- training: bool¶
- class easycv.models.selfsup.dino.DINOLoss(out_dim, ncrops, warmup_teacher_temp, teacher_temp, warmup_teacher_temp_epochs, nepochs, device, student_temp=0.1, center_momentum=0.9)[source]¶
Bases:
torch.nn.modules.module.Module
- __init__(out_dim, ncrops, warmup_teacher_temp, teacher_temp, warmup_teacher_temp_epochs, nepochs, device, student_temp=0.1, center_momentum=0.9)[source]¶
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- forward(student_output, teacher_output, epoch)[source]¶
Cross-entropy between softmax outputs of the teacher and student networks.
- training: bool¶
- class easycv.models.selfsup.dino.DINO(backbone, train_preprocess=[], neck=None, config=None, pretrained=None)[source]¶
Bases:
easycv.models.base.BaseModel
- __init__(backbone, train_preprocess=[], neck=None, config=None, pretrained=None)[source]¶
Init Moby
- Parameters
backbone – backbone config to build vision backbone
train_preprocess – [gaussBlur, mixUp, solarize]
neck – neck config to build Moby Neck
config – DINO parameter config
- forward_train(inputs)[source]¶
Abstract interface for model forward in training
- Parameters
img (Tensor) – image tensor
kwargs (keyword arguments) – Specific to concrete implementation.
- forward_test(img, **kwargs)[source]¶
Abstract interface for model forward in testing
- Parameters
img (Tensor) – image tensor
kwargs (keyword arguments) – Specific to concrete implementation.
- forward_feature(img, **kwargs)[source]¶
Forward backbone
- Returns
feature tensor
- Return type
x (torch.Tensor)
- training: bool¶
- forward(img, gt_label=None, mode='train', extract_list=['neck'], **kwargs)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
easycv.models.selfsup.mae module¶
- class easycv.models.selfsup.mae.MAE(backbone, neck, mask_ratio=0.75, norm_pix_loss=True, **kwargs)[source]¶
Bases:
easycv.models.base.BaseModel
- __init__(backbone, neck, mask_ratio=0.75, norm_pix_loss=True, **kwargs)[source]¶
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- patchify(imgs)[source]¶
convert image to patch
- Parameters
imgs – (N, 3, H, W)
- Returns
(N, L, patch_size**2 *3)
- Return type
x
- forward_loss(imgs, pred, mask)[source]¶
compute loss
- Parameters
imgs – (N, 3, H, W)
pred – (N, L, p*p*3)
mask – (N, L), 0 is keep, 1 is remove,
- forward_train(img, **kwargs)[source]¶
Abstract interface for model forward in training
- Parameters
img (Tensor) – image tensor
kwargs (keyword arguments) – Specific to concrete implementation.
- forward_test(img, **kwargs)[source]¶
Abstract interface for model forward in testing
- Parameters
img (Tensor) – image tensor
kwargs (keyword arguments) – Specific to concrete implementation.
- forward(img, mode='train', **kwargs)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- training: bool¶
easycv.models.selfsup.mixco module¶
- class easycv.models.selfsup.mixco.MIXCO(backbone, train_preprocess=[], neck=None, head=None, mixco_head=None, pretrained=None, queue_len=65536, feat_dim=128, momentum=0.999, **kwargs)[source]¶
Bases:
easycv.models.selfsup.moco.MOCO
MOCO.
A mixup version moco https://arxiv.org/pdf/2010.06300.pdf
- __init__(backbone, train_preprocess=[], neck=None, head=None, mixco_head=None, pretrained=None, queue_len=65536, feat_dim=128, momentum=0.999, **kwargs)[source]¶
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- forward_train(img, **kwargs)[source]¶
Abstract interface for model forward in training
- Parameters
img (Tensor) – image tensor
kwargs (keyword arguments) – Specific to concrete implementation.
- training: bool¶
easycv.models.selfsup.moby module¶
- class easycv.models.selfsup.moby.MoBY(backbone, train_preprocess=[], neck=None, head=None, pretrained=None, queue_len=4096, contrast_temperature=0.2, momentum=0.99, online_drop_path_rate=0.2, target_drop_path_rate=0.0, **kwargs)[source]¶
Bases:
easycv.models.base.BaseModel
MoBY. Part of the code is borrowed from: https://github.com/SwinTransformer/Transformer-SSL/blob/main/models/moby.py.
- __init__(backbone, train_preprocess=[], neck=None, head=None, pretrained=None, queue_len=4096, contrast_temperature=0.2, momentum=0.99, online_drop_path_rate=0.2, target_drop_path_rate=0.0, **kwargs)[source]¶
Init Moby
- Parameters
backbone – backbone config to build vision backbone
train_preprocess – [gaussBlur, mixUp, solarize]
neck – neck config to build Moby Neck
head – head config to build Moby Neck
pretrained – pretrained weight for backbone
queue_len – moby queue length
contrast_temperature – contrastive_loss temperature
momentum – ema target weights momentum
online_drop_path_rate – for transformer based backbone, set online model drop_path_rate
target_drop_path_rate – for transformer based backbone, set target model drop_path_rate
- forward_train(img, **kwargs)[source]¶
Abstract interface for model forward in training
- Parameters
img (Tensor) – image tensor
kwargs (keyword arguments) – Specific to concrete implementation.
- forward_test(img, **kwargs)[source]¶
Abstract interface for model forward in testing
- Parameters
img (Tensor) – image tensor
kwargs (keyword arguments) – Specific to concrete implementation.
- forward_feature(img, **kwargs)[source]¶
Forward backbone
- Returns
feature tensor
- Return type
x (torch.Tensor)
- forward(img, gt_label=None, mode='train', extract_list=['neck'], **kwargs)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- training: bool¶
easycv.models.selfsup.moco module¶
- class easycv.models.selfsup.moco.MOCO(backbone, train_preprocess=[], neck=None, head=None, pretrained=None, queue_len=65536, feat_dim=128, momentum=0.999, **kwargs)[source]¶
Bases:
easycv.models.base.BaseModel
MOCO. Part of the code is borrowed from: https://github.com/facebookresearch/moco/blob/master/moco/builder.py.
- __init__(backbone, train_preprocess=[], neck=None, head=None, pretrained=None, queue_len=65536, feat_dim=128, momentum=0.999, **kwargs)[source]¶
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- forward_train(img, **kwargs)[source]¶
Abstract interface for model forward in training
- Parameters
img (Tensor) – image tensor
kwargs (keyword arguments) – Specific to concrete implementation.
- forward_test(img, **kwargs)[source]¶
Abstract interface for model forward in testing
- Parameters
img (Tensor) – image tensor
kwargs (keyword arguments) – Specific to concrete implementation.
- forward_feature(img, **kwargs)[source]¶
Forward backbone
- Returns
feature tensor
- Return type
x (torch.Tensor)
- forward(img, gt_label=None, mode='train', extract_list=['neck'], **kwargs)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- training: bool¶
easycv.models.selfsup.necks module¶
- class easycv.models.selfsup.necks.DINONeck(in_dim, out_dim, use_bn=False, norm_last_layer=True, nlayers=3, hidden_dim=2048, bottleneck_dim=256)[source]¶
Bases:
torch.nn.modules.module.Module
- __init__(in_dim, out_dim, use_bn=False, norm_last_layer=True, nlayers=3, hidden_dim=2048, bottleneck_dim=256)[source]¶
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- forward(x)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- training: bool¶
- class easycv.models.selfsup.necks.MoBYMLP(in_channels=256, hid_channels=4096, out_channels=256, num_layers=2, with_avg_pool=True)[source]¶
Bases:
torch.nn.modules.module.Module
- __init__(in_channels=256, hid_channels=4096, out_channels=256, num_layers=2, with_avg_pool=True)[source]¶
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- forward(x)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- training: bool¶
- class easycv.models.selfsup.necks.NonLinearNeckSwav(in_channels, hid_channels, out_channels, with_avg_pool=True, export=False)[source]¶
Bases:
torch.nn.modules.module.Module
The non-linear neck in byol: fc-syncbn-relu-fc
- __init__(in_channels, hid_channels, out_channels, with_avg_pool=True, export=False)[source]¶
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- forward(x)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- training: bool¶
- class easycv.models.selfsup.necks.NonLinearNeckV0(in_channels, hid_channels, out_channels, sync_bn=False, with_avg_pool=True)[source]¶
Bases:
torch.nn.modules.module.Module
The non-linear neck in ODC, fc-bn-relu-dropout-fc-relu
- __init__(in_channels, hid_channels, out_channels, sync_bn=False, with_avg_pool=True)[source]¶
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- forward(x)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- training: bool¶
- class easycv.models.selfsup.necks.NonLinearNeckV1(in_channels, hid_channels, out_channels, with_avg_pool=True)[source]¶
Bases:
torch.nn.modules.module.Module
The non-linear neck in MoCO v2: fc-relu-fc
- __init__(in_channels, hid_channels, out_channels, with_avg_pool=True)[source]¶
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- forward(x)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- training: bool¶
- class easycv.models.selfsup.necks.NonLinearNeckV2(in_channels, hid_channels, out_channels, with_avg_pool=True)[source]¶
Bases:
torch.nn.modules.module.Module
The non-linear neck in byol: fc-bn-relu-fc
- __init__(in_channels, hid_channels, out_channels, with_avg_pool=True)[source]¶
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- forward(x)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- training: bool¶
- class easycv.models.selfsup.necks.NonLinearNeckSimCLR(in_channels, hid_channels, out_channels, num_layers=2, with_avg_pool=True)[source]¶
Bases:
torch.nn.modules.module.Module
SimCLR non-linear neck.
- Structure: fc(no_bias)-bn(has_bias)-[relu-fc(no_bias)-bn(no_bias)].
The substructures in [] can be repeated. For the SimCLR default setting, the repeat time is 1.
- However, PyTorch does not support to specify (weight=True, bias=False).
It only support “affine” including the weight and bias. Hence, the second BatchNorm has bias in this implementation. This is different from the offical implementation of SimCLR.
- Since SyncBatchNorm in pytorch<1.4.0 does not support 2D input, the input is
expanded to 4D with shape: (N,C,1,1). I am not sure if this workaround has no bugs. See the pull request here: https://github.com/pytorch/pytorch/pull/29626
- Parameters
in_channels – input channel number
hid_channels – hidden channels
out_channels – output channel number
num_layers (int) – number of fc layers, it is 2 in the SimCLR default setting.
with_avg_pool – output with average pooling
- __init__(in_channels, hid_channels, out_channels, num_layers=2, with_avg_pool=True)[source]¶
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- forward(x)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- training: bool¶
- class easycv.models.selfsup.necks.RelativeLocNeck(in_channels, out_channels, sync_bn=False, with_avg_pool=True)[source]¶
Bases:
torch.nn.modules.module.Module
Relative patch location neck: fc-bn-relu-dropout
- __init__(in_channels, out_channels, sync_bn=False, with_avg_pool=True)[source]¶
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- forward(x)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- training: bool¶
- class easycv.models.selfsup.necks.MAENeck(num_patches, embed_dim=768, patch_size=16, in_chans=3, decoder_embed_dim=512, decoder_depth=8, decoder_num_heads=16, mlp_ratio=4.0, norm_layer=functools.partial(<class 'torch.nn.modules.normalization.LayerNorm'>, eps=1e-06))[source]¶
Bases:
torch.nn.modules.module.Module
MAE decoder
- Parameters
num_patches (int) – number of patches from encoder
embed_dim (int) – encoder embedding dimension
patch_size (int) – encoder patch size
in_chans (int) – input image channels
decoder_embed_dim (int) – decoder embedding dimension
decoder_depth (int) – number of decoder layers
decoder_num_heads (int) – Parallel attention heads
mlp_ratio (float) – mlp ratio
norm_layer – type of normalization layer
- __init__(num_patches, embed_dim=768, patch_size=16, in_chans=3, decoder_embed_dim=512, decoder_depth=8, decoder_num_heads=16, mlp_ratio=4.0, norm_layer=functools.partial(<class 'torch.nn.modules.normalization.LayerNorm'>, eps=1e-06))[source]¶
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- training: bool¶
- forward(x, ids_restore)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
easycv.models.selfsup.simclr module¶
- class easycv.models.selfsup.simclr.SimCLR(backbone, train_preprocess=[], neck=None, head=None, pretrained=None)[source]¶
Bases:
easycv.models.base.BaseModel
- __init__(backbone, train_preprocess=[], neck=None, head=None, pretrained=None)[source]¶
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- forward_train(img, **kwargs)[source]¶
Abstract interface for model forward in training
- Parameters
img (Tensor) – image tensor
kwargs (keyword arguments) – Specific to concrete implementation.
- forward_test(img, **kwargs)[source]¶
Abstract interface for model forward in testing
- Parameters
img (Tensor) – image tensor
kwargs (keyword arguments) – Specific to concrete implementation.
- forward(img, mode='train', **kwargs)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- training: bool¶
easycv.models.selfsup.swav module¶
- class easycv.models.selfsup.swav.SWAV(backbone, train_preprocess=[], neck=None, config=None, pretrained=None)[source]¶
Bases:
easycv.models.base.BaseModel
- __init__(backbone, train_preprocess=[], neck=None, config=None, pretrained=None)[source]¶
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- forward_train(inputs)[source]¶
Abstract interface for model forward in training
- Parameters
img (Tensor) – image tensor
kwargs (keyword arguments) – Specific to concrete implementation.
- forward_test(img, **kwargs)[source]¶
Abstract interface for model forward in testing
- Parameters
img (Tensor) – image tensor
kwargs (keyword arguments) – Specific to concrete implementation.
- forward_feature(img, **kwargs)[source]¶
Forward backbone
- Returns
feature tensor
- Return type
x (torch.Tensor)
- forward(img, gt_label=None, mode='train', extract_list=['neck'], **kwargs)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- training: bool¶
- class easycv.models.selfsup.swav.MultiPrototypes(output_dim, nmb_prototypes)[source]¶
Bases:
torch.nn.modules.module.Module
- __init__(output_dim, nmb_prototypes)[source]¶
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- forward(x)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- training: bool¶
easycv.models.utils package¶
Submodules¶
easycv.models.utils.accuracy module¶
- easycv.models.utils.accuracy.accuracy(pred, target, topk=1)[source]¶
- Parameters
pred – [N x num_classes]
target – [num_classes]
- class easycv.models.utils.accuracy.Accuracy(topk=(1))[source]¶
Bases:
torch.nn.modules.module.Module
- __init__(topk=(1))[source]¶
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- forward(pred, target)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- training: bool¶
easycv.models.utils.activation module¶
- class easycv.models.utils.activation.FReLU(in_channel)[source]¶
Bases:
torch.nn.modules.module.Module
- __init__(in_channel)[source]¶
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- forward(x)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- training: bool¶
easycv.models.utils.conv_module module¶
- easycv.models.utils.conv_module.build_conv_layer(cfg, *args, **kwargs)[source]¶
Build convolution layer
- Parameters
cfg (None or dict) – cfg should contain: type (str): identify conv layer type. layer args: args needed to instantiate a conv layer.
- Returns
created conv layer
- Return type
layer (nn.Module)
- class easycv.models.utils.conv_module.ConvModule(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias='auto', conv_cfg=None, norm_cfg=None, activation='relu', inplace=True, order=('conv', 'norm', 'act'))[source]¶
Bases:
torch.nn.modules.module.Module
A conv block that contains conv/norm/activation layers.
- Parameters
in_channels (int) – Same as nn.Conv2d.
out_channels (int) – Same as nn.Conv2d.
kernel_size (int or tuple[int]) – Same as nn.Conv2d.
stride (int or tuple[int]) – Same as nn.Conv2d.
padding (int or tuple[int]) – Same as nn.Conv2d.
dilation (int or tuple[int]) – Same as nn.Conv2d.
groups (int) – Same as nn.Conv2d.
bias (bool or str) – If specified as auto, it will be decided by the norm_cfg. Bias will be set as True if norm_cfg is None, otherwise False.
conv_cfg (dict) – Config dict for convolution layer.
norm_cfg (dict) – Config dict for normalization layer.
activation (str or None) – Activation type, “ReLU” by default.
inplace (bool) – Whether to use inplace mode for activation.
order (tuple[str]) – The order of conv/norm/activation layers. It is a sequence of “conv”, “norm” and “act”. Examples are (“conv”, “norm”, “act”) and (“act”, “conv”, “norm”).
- __init__(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias='auto', conv_cfg=None, norm_cfg=None, activation='relu', inplace=True, order=('conv', 'norm', 'act'))[source]¶
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- property norm¶
- forward(x, activate=True, norm=True)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- training: bool¶
easycv.models.utils.conv_ws module¶
- easycv.models.utils.conv_ws.conv_ws_2d(input, weight, bias=None, stride=1, padding=0, dilation=1, groups=1, eps=1e-05)[source]¶
- class easycv.models.utils.conv_ws.ConvWS2d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True, eps=1e-05)[source]¶
Bases:
torch.nn.modules.conv.Conv2d
- __init__(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True, eps=1e-05)[source]¶
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- forward(x)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- bias: Optional[torch.Tensor]¶
- out_channels: int¶
- kernel_size: Tuple[int, ...]¶
- stride: Tuple[int, ...]¶
- padding: Union[str, Tuple[int, ...]]¶
- dilation: Tuple[int, ...]¶
- transposed: bool¶
- output_padding: Tuple[int, ...]¶
- groups: int¶
- padding_mode: str¶
- weight: torch.Tensor¶
easycv.models.utils.dist_utils module¶
- class easycv.models.utils.dist_utils.DistributedLossWrapper(loss, **kwargs)[source]¶
Bases:
torch.nn.modules.module.Module
- __init__(loss, **kwargs)[source]¶
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- forward(embeddings, labels, *args, **kwargs)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- training: bool¶
- class easycv.models.utils.dist_utils.DistributedMinerWrapper(miner)[source]¶
Bases:
torch.nn.modules.module.Module
- __init__(miner)[source]¶
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- forward(embeddings, labels, ref_emb=None, ref_labels=None)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- training: bool¶
easycv.models.utils.gather_layer module¶
- class easycv.models.utils.gather_layer.GatherLayer(*args, **kwargs)[source]¶
Bases:
torch.autograd.function.Function
Gather tensors from all process, supporting backward propagation.
- static forward(ctx, input)[source]¶
Performs the operation.
This function is to be overridden by all subclasses.
It must accept a context ctx as the first argument, followed by any number of arguments (tensors or other types).
The context can be used to store arbitrary data that can be then retrieved during the backward pass. Tensors should not be stored directly on ctx (though this is not currently enforced for backward compatibility). Instead, tensors should be saved either with
ctx.save_for_backward()
if they are intended to be used inbackward
(equivalently,vjp
) orctx.save_for_forward()
if they are intended to be used for injvp
.
- static backward(ctx, *grads)[source]¶
Defines a formula for differentiating the operation with backward mode automatic differentiation (alias to the vjp function).
This function is to be overridden by all subclasses.
It must accept a context
ctx
as the first argument, followed by as many outputs as theforward()
returned (None will be passed in for non tensor outputs of the forward function), and it should return as many tensors, as there were inputs toforward()
. Each argument is the gradient w.r.t the given output, and each returned value should be the gradient w.r.t. the corresponding input. If an input is not a Tensor or is a Tensor not requiring grads, you can just pass None as a gradient for that input.The context can be used to retrieve tensors saved during the forward pass. It also has an attribute
ctx.needs_input_grad
as a tuple of booleans representing whether each input needs gradient. E.g.,backward()
will havectx.needs_input_grad[0] = True
if the first input toforward()
needs gradient computated w.r.t. the output.
easycv.models.utils.init_weights module¶
easycv.models.utils.multi_pooling module¶
- class easycv.models.utils.multi_pooling.GeMPooling(p=3, eps=1e-06)[source]¶
Bases:
torch.nn.modules.module.Module
GemPooling used for image retrival p = 1, avgpooling p > 1 : increases the contrast of the pooled feature map and focuses on the salient features of the image p = infinite : spatial max-pooling layer
- __init__(p=3, eps=1e-06)[source]¶
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- forward(x)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- training: bool¶
- class easycv.models.utils.multi_pooling.MultiPooling(pool_type='adaptive', in_indices=(0), backbone='resnet50')[source]¶
Bases:
torch.nn.modules.module.Module
Pooling layers for features from multiple depth.
- POOL_PARAMS = {'resnet50': [{'kernel_size': 10, 'stride': 10, 'padding': 4}, {'kernel_size': 16, 'stride': 8, 'padding': 0}, {'kernel_size': 13, 'stride': 5, 'padding': 0}, {'kernel_size': 8, 'stride': 3, 'padding': 0}, {'kernel_size': 6, 'stride': 1, 'padding': 0}]}¶
- POOL_SIZES = {'resnet50': [12, 6, 4, 3, 2]}¶
- POOL_DIMS = {'resnet50': [9216, 9216, 8192, 9216, 8192]}¶
- __init__(pool_type='adaptive', in_indices=(0), backbone='resnet50')[source]¶
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- forward(x)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- training: bool¶
- class easycv.models.utils.multi_pooling.MultiAvgPooling(pool_type='adaptive', in_indices=(0), backbone='resnet50')[source]¶
Bases:
torch.nn.modules.module.Module
Pooling layers for features from multiple depth.
- POOL_PARAMS = {'resnet50': [{'kernel_size': 10, 'stride': 10, 'padding': 4}, {'kernel_size': 16, 'stride': 8, 'padding': 0}, {'kernel_size': 13, 'stride': 5, 'padding': 0}, {'kernel_size': 8, 'stride': 3, 'padding': 0}, {'kernel_size': 7, 'stride': 1, 'padding': 0}]}¶
- POOL_SIZES = {'resnet50': [12, 6, 4, 3, 1]}¶
- POOL_DIMS = {'resnet50': [9216, 9216, 8192, 9216, 2048]}¶
- __init__(pool_type='adaptive', in_indices=(0), backbone='resnet50')[source]¶
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- training: bool¶
- forward(x)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
easycv.models.utils.norm module¶
- class easycv.models.utils.norm.SyncIBN(planes, ratio=0.5, eps=1e-05)[source]¶
Bases:
torch.nn.modules.module.Module
Instance-Batch Normalization layer from “Two at Once: Enhancing Learning and Generalization Capacities via IBN-Net” <https://arxiv.org/pdf/1807.09441.pdf> :param planes: Number of channels for the input tensor :type planes: int :param ratio: Ratio of instance normalization in the IBN layer :type ratio: float
- __init__(planes, ratio=0.5, eps=1e-05)[source]¶
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- forward(x)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- training: bool¶
- class easycv.models.utils.norm.IBN(planes, ratio=0.5, eps=1e-05)[source]¶
Bases:
torch.nn.modules.module.Module
Instance-Batch Normalization layer from “Two at Once: Enhancing Learning and Generalization Capacities via IBN-Net” <https://arxiv.org/pdf/1807.09441.pdf> :param planes: Number of channels for the input tensor :type planes: int :param ratio: Ratio of instance normalization in the IBN layer :type ratio: float
- __init__(planes, ratio=0.5, eps=1e-05)[source]¶
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- forward(x)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- training: bool¶
- easycv.models.utils.norm.build_norm_layer(cfg, num_features, postfix='')[source]¶
Build normalization layer
- Parameters
cfg (dict) – cfg should contain: type (str): identify norm layer type. layer args: args needed to instantiate a norm layer. requires_grad (bool): [optional] whether stop gradient updates
num_features (int) – number of channels from input.
postfix (int, str) – appended into norm abbreviation to create named layer.
- Returns
abbreviation + postfix layer (nn.Module): created norm layer
- Return type
name (str)
easycv.models.utils.ops module¶
- easycv.models.utils.ops.resize_tensor(input, size=None, scale_factor=None, mode='nearest', align_corners=None, warning=True)[source]¶
Resize tensor with F.interpolate.
- Parameters
input (Tensor) – the input tensor.
size (Tuple[int, int]) – output spatial size.
scale_factor (float or Tuple[float]) – multiplier for spatial size. If scale_factor is a tuple, its length has to match input.dim().
mode (str) – algorithm used for upsampling: ‘nearest’ | ‘linear’ | ‘bilinear’ | ‘bicubic’ | ‘trilinear’ | ‘area’. Default: ‘nearest’
align_corners (bool) –
Geometrically, we consider the pixels of the input and output as squares rather than points. If set to True, the input and output tensors are aligned by the center points of their corner pixels, preserving the values at the corner pixels.
If set to False, the input and output tensors are aligned by the corner points of their corner pixels, and the interpolation uses edge value padding for out-of-boundary values, making this operation independent of input size when scale_factor is kept the same. This only has an effect when mode is ‘linear’, ‘bilinear’, ‘bicubic’ or ‘trilinear’.
easycv.models.utils.pos_embed module¶
- easycv.models.utils.pos_embed.get_2d_sincos_pos_embed(embed_dim, grid_size, cls_token=False)[source]¶
grid_size: int of the grid height and width return: pos_embed: [grid_size*grid_size, embed_dim] or [1+grid_size*grid_size, embed_dim] (w/ or w/o cls_token)
easycv.models.utils.res_layer module¶
- class easycv.models.utils.res_layer.ResLayer(block, num_blocks, in_channels, out_channels, expansion=None, stride=1, avg_down=False, conv_cfg=None, norm_cfg={'type': 'BN'}, **kwargs)[source]¶
Bases:
torch.nn.modules.container.Sequential
ResLayer to build ResNet style backbone. :param block: Residual block used to build ResLayer. :type block: nn.Module :param num_blocks: Number of blocks. :type num_blocks: int :param in_channels: Input channels of this block. :type in_channels: int :param out_channels: Output channels of this block. :type out_channels: int :param expansion: The expansion for BasicBlock/Bottleneck.
If not specified, it will firstly be obtained via
block.expansion
. If the block has no attribute “expansion”, the following default values will be used: 1 for BasicBlock and 4 for Bottleneck. Default: None.- Parameters
stride (int) – stride of the first block. Default: 1.
avg_down (bool) – Use AvgPool instead of stride conv when downsampling in the bottleneck. Default: False
conv_cfg (dict, optional) – dictionary to construct and config conv layer. Default: None
norm_cfg (dict) – dictionary to construct and config norm layer. Default: dict(type=’BN’)
easycv.models.utils.scale module¶
- class easycv.models.utils.scale.Scale(scale=1.0)[source]¶
Bases:
torch.nn.modules.module.Module
A learnable scale parameter
- __init__(scale=1.0)[source]¶
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- forward(x)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- training: bool¶
easycv.models.utils.sobel module¶
- class easycv.models.utils.sobel.Sobel[source]¶
Bases:
torch.nn.modules.module.Module
- forward(x)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- training: bool¶
Submodules¶
easycv.models.base module¶
- class easycv.models.base.BaseModel[source]¶
Bases:
torch.nn.modules.module.Module
base class for model.
- abstract forward_train(img: torch.Tensor, **kwargs) → Dict[str, torch.Tensor][source]¶
Abstract interface for model forward in training
- Parameters
img (Tensor) – image tensor
kwargs (keyword arguments) – Specific to concrete implementation.
- forward_test(img: torch.Tensor, **kwargs) → Dict[str, torch.Tensor][source]¶
Abstract interface for model forward in testing
- Parameters
img (Tensor) – image tensor
kwargs (keyword arguments) – Specific to concrete implementation.
- forward(img, mode='train', **kwargs)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- train_step(data, optimizer)[source]¶
The iteration step during training.
This method defines an iteration step during training, except for the back propagation and optimizer updating, which are done in an optimizer hook. Note that in some complicated cases or models, the whole process including back propagation and optimizer updating is also defined in this method, such as GAN.
- Parameters
data (dict) – The output of dataloader.
optimizer (
torch.optim.Optimizer
| dict) – The optimizer of runner is passed totrain_step()
. This argument is unused and reserved.
- Returns
It should contain at least 3 keys:
loss
,log_vars
,num_samples
.loss
is a tensor for back propagation, which can be a weighted sum of multiple losses.log_vars
contains all the variables to be sent to the logger.num_samples
indicates the batch size (when the model is DDP, it means the batch size on each GPU), which is used for averaging the logs.
- Return type
dict
- val_step(data, optimizer)[source]¶
The iteration step during validation.
This method shares the same signature as
train_step()
, but used during val epochs. Note that the evaluation after training epochs is not implemented with this method, but an evaluation hook.
- training: bool¶
easycv.models.builder module¶
easycv.models.modelzoo module¶
easycv.models.registry module¶
easycv.utils package¶
Submodules¶
easycv.utils.alias_multinomial module¶
easycv.utils.bbox_util module¶
- easycv.utils.bbox_util.batched_cxcywh2xyxy_with_shape(bboxes, shape)[source]¶
reverse of xyxy2xywh_with_shape transform normalized points [[x_center, y_center, box_w, box_h],…] to standard [[x1, y1, x2, y2],…] :param bboxes: np.array or tensor like [[x_center, y_center, box_w, box_h],…],
all value is normalized
- Parameters
shape – img shape: [h, w]
return: np.array or tensor like [[x1, y1, x2, y2],…]
- easycv.utils.bbox_util.bbox_iou(box1, box2, x1y1x2y2=True, GIoU=False, DIoU=False, CIoU=False, eps=1e-09)[source]¶
- easycv.utils.bbox_util.box_iou(box1, box2)[source]¶
Return intersection-over-union (Jaccard index) of boxes. Both sets of boxes are expected to be in (x1, y1, x2, y2) format. :param box1: :type box1: Tensor[N, 4] :param box2: :type box2: Tensor[M, 4]
- Returns
- the NxM matrix containing the pairwise
IoU values for every element in boxes1 and boxes2
- Return type
iou (Tensor[N, M])
- easycv.utils.bbox_util.box_candidates(box1, box2, wh_thr=2, ar_thr=20, area_thr=0.1)[source]¶
Compute candidate boxes: box1 before augment, box2 after augment, wh_thr (pixels), aspect_ratio_thr, area_ratio
easycv.utils.checkpoint module¶
- easycv.utils.checkpoint.load_checkpoint(model, filename, map_location='cpu', strict=False, logger=None)[source]¶
Load checkpoint from a file or URI.
- Parameters
model (Module) – Module to load checkpoint.
filename (str) – Accept local filepath, URL,
torchvision://xxx
,open-mmlab://xxx
. Please refer todocs/model_zoo.md
for details.map_location (str) – Same as
torch.load()
.strict (bool) – Whether to allow different params for the model and checkpoint.
logger (
logging.Logger
or None) – The logger for error message.
- Returns
The loaded checkpoint.
- Return type
dict or OrderedDict
- easycv.utils.checkpoint.save_checkpoint(model, filename, optimizer=None, meta=None)[source]¶
Save checkpoint to file.
The checkpoint will have 3 fields:
meta
,state_dict
andoptimizer
. By defaultmeta
will contain version and time info.- Parameters
model (Module) – Module whose params are to be saved.
filename (str) – Checkpoint filename.
optimizer (
Optimizer
, optional) – Optimizer to be saved.meta (dict, optional) – Metadata to be saved in checkpoint.
easycv.utils.collect module¶
- easycv.utils.collect.nondist_forward_collect(func, data_loader, length)[source]¶
Forward and collect network outputs.
This function performs forward propagation and collects outputs. It can be used to collect results, features, losses, etc.
- Parameters
func (function) – The function to process data. The output must be a dictionary of CPU tensors.
length (int) – Expected length of output arrays.
- Returns
The concatenated outputs.
- Return type
results_all (dict(np.ndarray))
- easycv.utils.collect.dist_forward_collect(func, data_loader, rank, length, ret_rank=- 1)[source]¶
Forward and collect network outputs in a distributed manner.
This function performs forward propagation and collects outputs. It can be used to collect results, features, losses, etc.
- Parameters
func (function) – The function to process data. The output must be a dictionary of CPU tensors.
rank (int) – This process id.
length (int) – Expected length of output arrays.
ret_rank (int) – The process that returns. Other processes will return None.
- Returns
The concatenated outputs.
- Return type
results_all (dict(np.ndarray))
easycv.utils.config_tools module¶
- easycv.utils.config_tools.check_base_cfg_path(base_cfg_name='configs/base.py', ori_filename=None)[source]¶
- easycv.utils.config_tools.config_dict_edit(ori_cfg_dict, cfg_dict, reg, dict_mem_helper)[source]¶
edit ${configs.variables} in config dict to solve dependicies in config
ori_cfg_dict: to find the true value of ${configs.variables} cfg_dict: for find leafs of dict by recursive reg: Regular expression pattern for find all ${configs.variables} in leafs of dict dict_mem_helper: to store the true value of ${configs.variables} which have been found
easycv.utils.constant module¶
easycv.utils.dist_utils module¶
- easycv.utils.dist_utils.obj2tensor(pyobj, device='cuda')[source]¶
Serialize picklable python object to tensor.
- easycv.utils.dist_utils.all_reduce_dict(py_dict, op='sum', group=None, to_float=True)[source]¶
Apply all reduce function for python dict object.
The code is modified from https://github.com/Megvii- BaseDetection/YOLOX/blob/main/yolox/utils/allreduce_norm.py.
NOTE: make sure that py_dict in different ranks has the same keys and the values should be in the same shape.
- Parameters
py_dict (dict) – Dict to be applied all reduce op.
op (str) – Operator, could be ‘sum’ or ‘mean’. Default: ‘sum’
group (
torch.distributed.group
, optional) – Distributed group, Default: None.to_float (bool) – Whether to convert all values of dict to float. Default: True.
- Returns
reduced python dict object.
- Return type
OrderedDict
easycv.utils.eval_utils module¶
- easycv.utils.eval_utils.generate_best_metric_name(evaluate_type, dataset_name, metric_names)[source]¶
Generate best metric name for different evaluator / different dataset / different metric_names evaluate_type: str dataset_name: None or str metric_names: None str or list[str] or tuple(str)
- Returns
list[str]
easycv.utils.flops_counter module¶
- easycv.utils.flops_counter.get_model_info(model, input_size, model_config, logger)[source]¶
get_model_info, check model parameters and Gflops
- easycv.utils.flops_counter.get_model_complexity_info(model, input_res, print_per_layer_stat=True, as_strings=True, input_constructor=None, ost=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='UTF-8'>)[source]¶
- easycv.utils.flops_counter.params_to_string(params_num)[source]¶
converting number to string
- Parameters
params_num (float) – number
- Returns str
number
>>> params_to_string(1e9) '1000.0 M' >>> params_to_string(2e5) '200.0 k' >>> params_to_string(3e-9) '3e-09'
- easycv.utils.flops_counter.print_model_with_flops(model, units='GMac', precision=3, ost=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='UTF-8'>)[source]¶
- easycv.utils.flops_counter.compute_average_flops_cost(self)[source]¶
A method that will be available after add_flops_counting_methods() is called on a desired net object. Returns current mean flops consumption per image.
- easycv.utils.flops_counter.start_flops_count(self)[source]¶
A method that will be available after add_flops_counting_methods() is called on a desired net object. Activates the computation of mean flops consumption per image. Call it before you run the network.
- easycv.utils.flops_counter.stop_flops_count(self)[source]¶
A method that will be available after add_flops_counting_methods() is called on a desired net object. Stops computing the mean flops consumption per image. Call whenever you want to pause the computation.
easycv.utils.gather module¶
easycv.utils.json_utils module¶
Utilities for dealing with writing json strings.
json_utils wraps json.dump and json.dumps so that they can be used to safely control the precision of floats when writing to json strings or files.
- class easycv.utils.json_utils.MyEncoder(*, skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, sort_keys=False, indent=None, separators=None, default=None)[source]¶
Bases:
json.encoder.JSONEncoder
- default(o)[source]¶
Implement this method in a subclass such that it returns a serializable object for
o
, or calls the base implementation (to raise aTypeError
).For example, to support arbitrary iterators, you could implement default like this:
def default(self, o): try: iterable = iter(o) except TypeError: pass else: return list(iterable) # Let the base class default method raise the TypeError return JSONEncoder.default(self, o)
- easycv.utils.json_utils.dump(obj, fid, float_digits=- 1, **params)[source]¶
Wrapper of json.dump that allows specifying the float precision used.
- Parameters
obj – The object to dump.
fid – The file id to write to.
float_digits – The number of digits of precision when writing floats out.
**params – Additional parameters to pass to json.dumps.
- easycv.utils.json_utils.dumps(obj, float_digits=- 1, **params)[source]¶
Wrapper of json.dumps that allows specifying the float precision used.
- Parameters
obj – The object to dump.
float_digits – The number of digits of precision when writing floats out.
**params – Additional parameters to pass to json.dumps.
- Returns
JSON string representation of obj.
- Return type
output
- easycv.utils.json_utils.compat_dumps(data, float_digits=- 1)[source]¶
handle json dumps chinese and numpy data :param data python data structure: :param float_digits: The number of digits of precision when writing floats out.
- Returns
- json str, in python2 , the str is encoded with utf8
in python3, the str is unicode type(python3 str)
- easycv.utils.json_utils.PrettyParams(**params)[source]¶
Returns parameters for use with Dump and Dumps to output pretty json.
- Example usage:
`json_str = json_utils.Dumps(obj, **json_utils.PrettyParams())`
```json_str = json_utils.Dumps(obj, **json_utils.PrettyParams(allow_nans=False))```
- Parameters
**params – Additional params to pass to json.dump or json.dumps.
- Returns
- Parameters that are compatible with json_utils.Dump and
json_utils.Dumps.
- Return type
params
easycv.utils.logger module¶
- easycv.utils.logger.get_root_logger(log_file=None, log_level=20)[source]¶
Get the root logger.
The logger will be initialized if it has not been initialized. By default a StreamHandler will be added. If log_file is specified, a FileHandler will also be added. The name of the root logger is the top-level package name, e.g., “easycv”.
- Parameters
log_file (str | None) – The log filename. If specified, a FileHandler will be added to the root logger.
log_level (int) – The root logger level. Note that only the process of rank 0 is affected, while other processes will set the level to “Error” and be silent most of the time.
- Returns
The root logger.
- Return type
logging.Logger
- easycv.utils.logger.print_log(msg, logger=None, level=20)[source]¶
Print a log message.
- Parameters
msg (str) – The message to be logged.
logger (logging.Logger | str | None) – The logger to be used. Some special loggers are: - “root”: the root logger obtained with get_root_logger(). - “silent”: no message will be printed. - None: The print() method will be used to print log messages.
level (int) – Logging level. Only available when logger is a Logger object or “root”.
easycv.utils.metric_distance module¶
easycv.utils.misc module¶
easycv.utils.preprocess_function module¶
- easycv.utils.preprocess_function.bninceptionPre(image, mean=[104, 117, 128], std=[1, 1, 1])[source]¶
- Parameters
image – pytorch Image tensor from PIL (range 0~1), bgr format
mean – norm mean
std – norm val
- Returns
A image norm in 0~255, rgb format
- easycv.utils.preprocess_function.randomErasing(image, probability=0.5, sl=0.02, sh=0.2, r1=0.3, mean=[0.4914, 0.4822, 0.4465])[source]¶
easycv.utils.profiling module¶
easycv.utils.py_util module¶
easycv.utils.registry module¶
- class easycv.utils.registry.Registry(name)[source]¶
Bases:
object
- property name¶
- property module_dict¶
- easycv.utils.registry.build_from_cfg(cfg, registry, default_args=None)[source]¶
Build a module from config dict.
- Parameters
cfg (dict) – Config dict. It should at least contain the key “type”.
registry (
Registry
) – The registry to search the type from.default_args (dict, optional) – Default initialization arguments.
- Returns
The constructed object.
- Return type
obj
easycv.utils.test_util module¶
Contains functions which are convenient for unit testing.
- easycv.utils.test_util.replace_data_for_test(cfg)[source]¶
replace real data with test data
- Parameters
cfg – Config object
easycv package¶
Subpackages¶
easycv.file package¶
Submodules¶
easycv.file.base module¶
- class easycv.file.base.IOLocal[source]¶
Bases:
easycv.file.base.IOBase
easycv.file.file_io module¶
- easycv.file.file_io.set_oss_env(ak_id: str, ak_secret: str, hosts: Union[str, List[str]], buckets: Union[str, List[str]])[source]¶
- class easycv.file.file_io.IO(max_retry=10)[source]¶
Bases:
easycv.file.base.IOLocal
IO module to support both local and oss io. If access oss file, you need to authorize OSS, please refer to IO.access_oss.
- access_oss(ak_id: str = '', ak_secret: str = '', hosts: Union[str, List[str]] = '', buckets: Union[str, List[str]] = '')[source]¶
If access oss file, you need to authorize OSS as follows:
Method1: from easycv.file import io io.access_oss(
ak_id=’your_accesskey_id’, ak_secret=’your_accesskey_secret’, hosts=’your endpoint’ or [‘your endpoint1’, ‘your endpoint2’], buckets=’your bucket’ or [‘your bucket1’, ‘your bucket2’])
- Method2:
Add oss config to your local file ~/.ossutilconfig, as follows: More oss config information, please refer to: https://help.aliyun.com/document_detail/120072.html ``` [Credentials]
language = CH endpoint = your endpoint accessKeyID = your_accesskey_id accessKeySecret = your_accesskey_secret
- [Bucket-Endpoint]
bucket1 = endpoint1 bucket2 = endpoint2
``` Then run the following command, the config file will be read by default to authorize oss.
from easycv.file import io io.access_oss()
- open(full_path, mode='r')[source]¶
Same usage as the python build-in open. Support local path and oss path.
Example
from easycv.file import io io.access_oss(your oss config) # only oss file need, refer to IO.access_oss # Write something to a oss file. with io.open(‘oss://bucket_name/demo.txt’, ‘w’) as f:
f.write(“test”)
# Read from a oss file. with io.open(‘oss://bucket_name/demo.txt’, ‘r’) as f:
print(f.read())
- Parameters
full_path – absolute oss path
- exists(path)[source]¶
Whether the file exists, same usage as os.path.exists. Support local path and oss path.
Example
from easycv.file import io io.access_oss(your oss config) # only oss file need, refer to IO.access_oss ret = io.exists(‘oss://bucket_name/dir’) print(ret)
- Parameters
path – oss path or local path
- move(src, dst)[source]¶
Move src to dst, same usage as shutil.move. Support local path and oss path.
Example
from easycv.file import io io.access_oss(your oss config) # only oss file need, refer to IO.access_oss # move oss file to local io.move(‘oss://bucket_name/file.txt’, ‘/your/local/path/file.txt’) # move oss file to oss io.move(‘oss://bucket_name/file.txt’, ‘oss://bucket_name/file.txt’) # move local file to oss io.move(‘/your/local/file.txt’, ‘oss://bucket_name/file.txt’) # move directory io.move(‘oss://bucket_name/dir1’, ‘oss://bucket_name/dir2’)
- Parameters
src – oss path or local path
dst – oss path or local path
- copy(src, dst)[source]¶
Copy a file from src to dst. Same usage as shutil.copyfile. If you want to copy a directory, please use easycv.io.copytree
Example
from easycv.file import io io.access_oss(your oss config) # only oss file need, refer to IO.access_oss # Copy a file from local to oss: io.copy(‘/your/local/file.txt’, ‘oss://bucket/dir/file.txt’)
# Copy a oss file to local: io.copy(‘oss://bucket/dir/file.txt’, ‘/your/local/file.txt’)
# Copy a file from oss to oss:: io.copy(‘oss://bucket/dir/file.txt’, ‘oss://bucket/dir/file2.txt’)
- Parameters
src – oss path or local path
dst – oss path or local path
- copytree(src, dst)[source]¶
Copy files recursively from src to dst. Same usage as shutil.copytree. If you want to copy a file, please use easycv.io.copy.
Example
from easycv.file import io io.access_oss(your oss config) # only oss file need, refer to IO.access_oss # copy files from local to oss io.copytree(src=’/your/local/dir1’, dst=’oss://bucket_name/dir2’) # copy files from oss to local io.copytree(src=’oss://bucket_name/dir2’, dst=’/your/local/dir1’) # copy files from oss to oss io.copytree(src=’oss://bucket_name/dir1’, dst=’oss://bucket_name/dir2’)
- Parameters
src – oss path or local path
dst – oss path or local path
- listdir(path, recursive=False, full_path=False, contains: Optional[Union[str, List[str]]] = None)[source]¶
List all objects in path. Same usage as os.listdir.
Example
from easycv.file import io io.access_oss(your oss config) # only oss file need, refer to IO.access_oss ret = io.listdir(‘oss://bucket/dir’, recursive=True) print(ret)
- Parameters
path – local file path or oss path.
recursive – If False, only list the top level objects. If True, recursively list all objects.
full_path – if full path, return files with path prefix.
contains – substr to filter list files.
return: A list of path.
- remove(path)[source]¶
Remove a file or a directory recursively. Same usage as os.remove or shutil.rmtree.
Example
from easycv.file import io io.access_oss(your oss config) # only oss file need, refer to IO.access_oss # Remove a oss file io.remove(‘oss://bucket_name/file.txt’)
# Remove a oss directory io.remove(‘oss://bucket_name/dir/’)
- Parameters
path – local or oss path, file or directory
- rmtree(path)[source]¶
Remove directory recursively, same usage as shutil.rmtree
Example
from easycv.file import io io.access_oss(your oss config) # only oss file need, refer to IO.access_oss io.remove(‘oss://bucket_name/dir_name’) # Or io.remove(‘oss://bucket_name/dir_name/’)
- Parameters
path – oss path
- makedirs(path, exist_ok=True)[source]¶
Create directories recursively, same usage as os.makedirs
Example
from easycv.file import io io.access_oss(your oss config) # only oss file need, refer to IO.access_oss io.makedirs(‘oss://bucket/new_dir/’)
- Parameters
path – local or oss dir path
- isdir(path)[source]¶
Return whether a path is directory, same usage as os.path.isdir
Example
from easycv.file import io io.access_oss(your oss config) # only oss file need, refer to IO.access_oss io.isdir(‘oss://bucket/dir/’)
- Parameters
path – local or oss path
Return: bool, True or False.
- isfile(path)[source]¶
Return whether a path is file object, same usage as os.path.isfile
Example
from easycv.file import io io.access_oss(your oss config) # only oss file need, refer to IO.access_oss io.isfile(‘oss://bucket/file.txt’)
- Parameters
path – local or oss path
Return: bool, True or False.
- glob(file_path)[source]¶
Return a list of paths matching a pathname pattern. .. rubric:: Example
from easycv.file import io io.access_oss(your oss config) # only oss file need, refer to IO.access_oss io.glob(‘oss://bucket/dir/*.txt’)
- Parameters
path – local or oss file pattern
Return: list, a list of paths.
- size(path: str) → int[source]¶
Get the size of file path, same usage as os.path.getsize
Example
from easycv.file import io io.access_oss(your oss config) # only oss file need, refer to IO.access_oss size = io.size(‘oss://bucket/file.txt’) print(size)
- Parameters
path – local or oss path.
Return: size of file in bytes
- class easycv.file.file_io.OSSFile(bucket, path, position=0)[source]¶
Bases:
object
easycv.runner package¶
Submodules¶
easycv.runner.ev_runner module¶
- class easycv.runner.ev_runner.EVRunner(model, batch_processor=None, optimizer=None, work_dir=None, logger=None, meta=None)[source]¶
Bases:
mmcv.runner.epoch_based_runner.EpochBasedRunner
- __init__(model, batch_processor=None, optimizer=None, work_dir=None, logger=None, meta=None)[source]¶
Epoch Runner for easycv, add support for oss IO and file sync.
- Parameters
model (
torch.nn.Module
) – The model to be run.batch_processor (callable) – A callable method that process a data batch. The interface of this method should be batch_processor(model, data, train_mode) -> dict
optimizer (dict or
torch.optim.Optimizer
) – It can be either an optimizer (in most cases) or a dict of optimizers (in models that requires more than one optimizer, e.g., GAN).work_dir (str, optional) – The working directory to save checkpoints and logs. Defaults to None.
logger (
logging.Logger
) – Logger used during training. Defaults to None. (The default value is just for backward compatibility)meta (dict | None) – A dict records some import information such as environment info and seed, which will be logged in logger hook. Defaults to None.
- run_iter(data_batch, train_mode, **kwargs)[source]¶
process for each iteration.
- Parameters
data_batch – Batch of dict of data.
train_model (bool) – If set True, run training step else validation step.
- train(data_loader, **kwargs)[source]¶
Training process for one epoch which will iterate through all training data and call hooks at different stages.
- Parameters
data_loader – data loader object for training
- val(data_loader, **kwargs)[source]¶
Validation step which Deprecated, using evaluation hook instead.
- save_checkpoint(out_dir, filename_tmpl='epoch_{}.pth', save_optimizer=True, meta=None, create_symlink=True)[source]¶
Save checkpoint to file.
- Parameters
out_dir – Directory where checkpoint files are to be saved.
filename_tmpl (str, optional) – Checkpoint filename pattern.
save_optimizer (bool, optional) – save optimizer state.
meta (dict, optional) – Metadata to be saved in checkpoint.
- current_lr()[source]¶
Get current learning rates.
- Returns
- Current learning rates of all
param groups. If the runner has a dict of optimizers, this method will return a dict.
- Return type
list[float] | dict[str, list[float]]
- load_checkpoint(filename, map_location=device(type='cpu'), strict=False, logger=None)[source]¶
Load checkpoint from a file or URL.
- Parameters
filename (str) – Accept local filepath, URL,
torchvision://xxx
,open-mmlab://xxx
,oss://xxx
. Please refer todocs/source/model_zoo.md
for details.map_location (str) – Same as
torch.load()
.strict (bool) – Whether to allow different params for the model and checkpoint.
logger (
logging.Logger
or None) – The logger for error message.
- Returns
The loaded checkpoint.
- Return type
dict or OrderedDict