detectron2.modeling package

detectron2.modeling.build_anchor_generator(cfg, input_shape)[source]

Built an anchor generator from cfg.MODEL.ANCHOR_GENERATOR.NAME.

class detectron2.modeling.FPN(bottom_up, in_features, out_channels, norm='', top_block=None, fuse_type='sum')[source]

Bases: detectron2.modeling.backbone.backbone.Backbone

This module implements Feature Pyramid Network. It creates pyramid features built on top of some input feature maps.

__init__(bottom_up, in_features, out_channels, norm='', top_block=None, fuse_type='sum')[source]
Parameters:
  • bottom_up (Backbone) – module representing the bottom up subnetwork. Must be a subclass of Backbone. The multi-scale feature maps generated by the bottom up network, and listed in in_features, are used to generate FPN levels.
  • in_features (list[str]) – names of the input feature maps coming from the backbone to which FPN is attached. For example, if the backbone produces [“res2”, “res3”, “res4”], any contiguous sublist of these may be used; order must be from high to low resolution.
  • out_channels (int) – number of channels in the output feature maps.
  • norm (str) – the normalization to use.
  • top_block (nn.Module or None) – if provided, an extra operation will be performed on the output of the last (smallest resolution) FPN output, and the result will extend the result list. The top_block further downsamples the feature map. It must have an attribute “num_levels”, meaning the number of extra FPN levels added by this block, and “in_feature”, which is a string representing its input feature (e.g., p5).
  • fuse_type (str) – types for fusing the top down features and the lateral ones. It can be “sum” (default), which sums up element-wise; or “avg”, which takes the element-wise mean of the two.
size_divisibility
forward(x)[source]
Parameters:input (dict[str->Tensor]) – mapping feature map name (e.g., “res5”) to feature map tensor for each feature level in high to low resolution order.
Returns:dict[str->Tensor] – mapping from feature map name to FPN feature map tensor in high to low resolution order. Returned feature names follow the FPN paper convention: “p<stage>”, where stage has stride = 2 ** stage e.g., [“p2”, “p3”, …, “p6”].
output_shape()[source]
class detectron2.modeling.Backbone[source]

Bases: torch.nn.modules.module.Module

Abstract base class for network backbones.

__init__()[source]

The __init__ method of any subclass can specify its own set of arguments.

forward()[source]

Subclasses must override this method, but adhere to the same return type.

Returns:dict[str->Tensor] – mapping from feature name (e.g., “res2”) to tensor
size_divisibility

Some backbones require the input height and width to be divisible by a specific integer. This is typically true for encoder / decoder type networks with lateral connection (e.g., FPN) for which feature maps need to match dimension in the “bottom up” and “top down” paths. Set to 0 if no specific input size divisibility is required.

output_shape()[source]
Returns:dict[str->ShapeSpec]
class detectron2.modeling.ResNet(stem, stages, num_classes=None, out_features=None)[source]

Bases: detectron2.modeling.backbone.backbone.Backbone

__init__(stem, stages, num_classes=None, out_features=None)[source]
Parameters:
  • stem (nn.Module) – a stem module
  • stages (list[list[ResNetBlock]]) – several (typically 4) stages, each contains multiple ResNetBlockBase.
  • num_classes (None or int) – if None, will not perform classification.
  • out_features (list[str]) – name of the layers whose outputs should be returned in forward. Can be anything in “stem”, “linear”, or “res2” … If None, will return the output of the last layer.
forward(x)[source]
output_shape()[source]
class detectron2.modeling.ResNetBlockBase(in_channels, out_channels, stride)[source]

Bases: torch.nn.modules.module.Module

__init__(in_channels, out_channels, stride)[source]

The __init__ method of any subclass should also contain these arguments.

Parameters:
  • in_channels (int) –
  • out_channels (int) –
  • stride (int) –
freeze()[source]
detectron2.modeling.build_backbone(cfg, input_shape=None)[source]

Build a backbone from cfg.MODEL.BACKBONE.NAME.

Returns:an instance of Backbone
detectron2.modeling.build_resnet_backbone(cfg, input_shape)[source]

Create a ResNet instance from config.

Returns:ResNet – a ResNet instance.
detectron2.modeling.make_stage(block_class, num_blocks, first_stride, **kwargs)[source]

Create a resnet stage by creating many blocks. :param block_class: a subclass of ResNetBlockBase :type block_class: class :param num_blocks: :type num_blocks: int :param first_stride: the stride of the first block. The other blocks will have stride=1.

A stride argument will be passed to the block constructor.
Parameters:kwargs – other arguments passed to the block constructor.
Returns:list[nn.Module] – a list of block module.
class detectron2.modeling.GeneralizedRCNN(cfg)[source]

Bases: torch.nn.modules.module.Module

Generalized R-CNN. Any models that contains the following three components: 1. Per-image feature extraction (aka backbone) 2. Region proposal generation 3. Per-region feature extraction and prediction

visualize_training(batched_inputs, proposals)[source]

A function used to visualize images and proposals. It shows ground truth bounding boxes on the original image and up to 20 predicted object proposals on the original image. Users can implement different visualization functions for different models.

Parameters:
  • batched_inputs (list) – a list that contains input to the model.
  • proposals (list) – a list that contains predicted proposals. Both batched_inputs and proposals should have the same length.
forward(batched_inputs)[source]
Parameters:batched_inputs

a list, batched outputs of DatasetMapper . Each item in the list contains the inputs for one image. For now, each item in the list is a dict that contains:

  • image: Tensor, image in (C, H, W) format.
  • instances (optional): groundtruth Instances
  • proposals (optional): Instances, precomputed proposals.

Other information that’s included in the original dicts, such as:

  • ”height”, “width” (int): the output resolution of the model, used in inference.
    See postprocess() for details.
Returns:list[dict] – Each dict is the output for one input image. The dict contains one key “instances” whose value is a Instances. The Instances object has the following keys:
”pred_boxes”, “pred_classes”, “scores”, “pred_masks”, “pred_keypoints”
inference(batched_inputs, detected_instances=None, do_postprocess=True)[source]

Run inference on the given inputs.

Parameters:
  • batched_inputs (list[dict]) – same as in forward()
  • detected_instances (None or list[Instances]) – if not None, it contains an Instances object per image. The Instances object contains “pred_boxes” and “pred_classes” which are known boxes in the image. The inference will then skip the detection of bounding boxes, and only predict other per-ROI outputs.
  • do_postprocess (bool) – whether to apply post-processing on the outputs.
Returns:

same as in forward().

preprocess_image(batched_inputs)[source]

Normalize, pad and batch the input images.

class detectron2.modeling.PanopticFPN(cfg)[source]

Bases: torch.nn.modules.module.Module

Main class for Panoptic FPN architectures (see https://arxiv.org/abd/1901.02446).

forward(batched_inputs)[source]
Parameters:batched_inputs – a list, batched outputs of DatasetMapper. Each item in the list contains the inputs for one image.
For now, each item in the list is a dict that contains:

image: Tensor, image in (C, H, W) format. instances: Instances sem_seg: semantic segmentation ground truth. Other information that’s included in the original dicts, such as:

“height”, “width” (int): the output resolution of the model, used in inference.
See postprocess() for details.
Returns:list[dict]
each dict is the results for one image. The dict
contains the following keys: “instances”: see GeneralizedRCNN.forward() for its format. “sem_seg”: see SemanticSegmentor.forward() for its format. “panoptic_seg”: available when PANOPTIC_FPN.COMBINE.ENABLED.
See the return value of combine_semantic_and_instance_outputs() for its format.
class detectron2.modeling.ProposalNetwork(cfg)[source]

Bases: torch.nn.modules.module.Module

forward(batched_inputs)[source]

:param Same as in GeneralizedRCNN.forward:

Returns:list[dict]
Each dict is the output for one input image.
The dict contains one key “proposals” whose value is a Instances with keys “proposal_boxes” and “objectness_logits”.
class detectron2.modeling.RetinaNet(cfg)[source]

Bases: torch.nn.modules.module.Module

Implement RetinaNet (https://arxiv.org/abs/1708.02002).

forward(batched_inputs)[source]
Parameters:batched_inputs

a list, batched outputs of DatasetMapper . Each item in the list contains the inputs for one image. For now, each item in the list is a dict that contains:

  • image: Tensor, image in (C, H, W) format.
  • instances: Instances

Other information that’s included in the original dicts, such as:

  • ”height”, “width” (int): the output resolution of the model, used in inference.
    See postprocess() for details.
Returns:dict[str
Tensor]:
mapping from a named loss to a tensor storing the loss. Used during training only.
losses(gt_classes, gt_anchors_deltas, pred_class_logits, pred_anchor_deltas)[source]
Parameters:
  • gt_classes and gt_anchors_deltas parameters, see (For) – RetinaNet.get_ground_truth().
  • shapes are (Their) –
  • total number of anchors across levels, i.e. sum (the) –
  • pred_class_logits and pred_anchor_deltas, see (For) – RetinaNetHead.forward().
Returns:

dict[str

Tensor]:

mapping from a named loss to a scalar tensor storing the loss. Used during training only. The dict keys are: “loss_cls” and “loss_box_reg”

get_ground_truth(anchors, targets)[source]
Parameters:
  • anchors (list[list[Boxes]]) – a list of N=#image elements. Each is a list of #feature level Boxes. The Boxes contains anchors of this image on the specific feature level.
  • targets (list[Instances]) – a list of N Instances`s. The i-th `Instances contains the ground-truth per-instance annotations for the i-th input image. Specify targets during training only.
Returns:

gt_classes (Tensor) – An integer tensor of shape (N, R) storing ground-truth

labels for each anchor. R is the total number of anchors, i.e. the sum of Hi x Wi x A for all levels. Anchors with an IoU with some target higher than the foreground threshold are assigned their corresponding label in the [0, K-1] range. Anchors whose IoU are below the background threshold are assigned the label “K”. Anchors whose IoU are between the foreground and background thresholds are assigned a label “-1”, i.e. ignore.

gt_anchors_deltas (Tensor):

Shape (N, R, 4). The last dimension represents ground-truth box2box transform targets (dx, dy, dw, dh) that map each anchor to its matched ground-truth box. The values in the tensor are meaningful only when the corresponding anchor is labeled as foreground.

inference(box_cls, box_delta, anchors, images)[source]
Parameters:
  • box_delta (box_cls,) – Same as the output of RetinaNetHead.forward()
  • anchors (list[list[Boxes]]) – a list of #images elements. Each is a list of #feature level Boxes. The Boxes contain anchors of this image on the specific feature level.
  • images (ImageList) – the input images
Returns:

results (List[Instances]) – a list of #images elements.

inference_single_image(box_cls, box_delta, anchors, image_size)[source]

Single-image inference. Return bounding-box detection results by thresholding on scores and applying non-maximum suppression (NMS).

Parameters:
  • box_cls (list[Tensor]) – list of #feature levels. Each entry contains tensor of size (H x W x A, K)
  • box_delta (list[Tensor]) – Same shape as ‘box_cls’ except that K becomes 4.
  • anchors (list[Boxes]) – list of #feature levels. Each entry contains a Boxes object, which contains all the anchors for that image in that feature level.
  • image_size (tuple(H, W)) – a tuple of the image height and width.
Returns:

Same as inference, but for only one image.

preprocess_image(batched_inputs)[source]

Normalize, pad and batch the input images.

class detectron2.modeling.SemanticSegmentor(cfg)[source]

Bases: torch.nn.modules.module.Module

Main class for semantic segmentation architectures.

forward(batched_inputs)[source]
Parameters:batched_inputs – a list, batched outputs of DatasetMapper . Each item in the list contains the inputs for one image.
For now, each item in the list is a dict that contains:

image: Tensor, image in (C, H, W) format. sem_seg: semantic segmentation ground truth Other information that’s included in the original dicts, such as:

“height”, “width” (int): the output resolution of the model, used in inference.
See postprocess() for details.
Returns:list[dict]
Each dict is the output for one input image.
The dict contains one key “sem_seg” whose value is a Tensor of the output resolution that represents the per-pixel segmentation prediction.
detectron2.modeling.build_model(cfg)[source]

Build the whole model architecture, defined by cfg.MODEL.META_ARCHITECTURE. Note that it does not load any weights from cfg.

detectron2.modeling.build_sem_seg_head(cfg, input_shape)[source]

Build a semantic segmentation head from cfg.MODEL.SEM_SEG_HEAD.NAME.

detectron2.modeling.detector_postprocess(results, output_height, output_width, mask_threshold=0.5)[source]

Resize the output instances. The input images are often resized when entering an object detector. As a result, we often need the outputs of the detector in a different resolution from its inputs.

This function will resize the raw outputs of an R-CNN detector to produce outputs according to the desired output resolution.

Parameters:
  • results (Instances) – the raw outputs from the detector. results.image_size contains the input image resolution the detector sees. This object might be modified in-place.
  • output_width (output_height,) – the desired output resolution.
Returns:

Instances – the resized output from the model, based on the output resolution

detectron2.modeling.build_proposal_generator(cfg, input_shape)[source]

Build a proposal generator from cfg.MODEL.PROPOSAL_GENERATOR.NAME. The name can be “PrecomputedProposals” to use no proposal generator.

detectron2.modeling.build_rpn_head(cfg, input_shape)[source]

Build an RPN head defined by cfg.MODEL.RPN.HEAD_NAME.

class detectron2.modeling.ROIHeads(cfg, input_shape: Dict[str, detectron2.layers.shape_spec.ShapeSpec])[source]

Bases: torch.nn.modules.module.Module

ROIHeads perform all per-region computation in an R-CNN.

It contains logic of cropping the regions, extract per-region features, and make per-region predictions.

It can have many variants, implemented as subclasses of this class.

label_and_sample_proposals(proposals, targets)[source]

Prepare some proposals to be used to train the ROI heads. It performs box matching between proposals and targets, and assigns training labels to the proposals. It returns self.batch_size_per_image random samples from proposals and groundtruth boxes, with a fraction of positives that is no larger than self.positive_sample_fraction.

:param See ROIHeads.forward():

Returns:list[Instances] – length N list of Instances`s containing the proposals sampled for training. Each `Instances has the following fields:
  • proposal_boxes: the proposal boxes
  • gt_boxes: the ground-truth box that the proposal is assigned to (this is only meaningful if the proposal has a label > 0; if label = 0 then the ground-truth box is random)

Other fields such as “gt_classes”, “gt_masks”, that’s included in targets.

forward(images, features, proposals, targets=None)[source]
Parameters:
  • images (ImageList) –
  • (dict[str (features) – Tensor]): input data as a mapping from feature map name to tensor. Axis 0 represents the number of images N in the input data; axes 1-3 are channels, height, and width, which may vary between feature maps (e.g., if a feature pyramid is used).
  • proposals (list[Instances]) – length N list of Instances`s. The i-th `Instances contains object proposals for the i-th input image, with fields “proposal_boxes” and “objectness_logits”.
  • targets (list[Instances], optional) –

    length N list of Instances`s. The i-th `Instances contains the ground-truth per-instance annotations for the i-th input image. Specify targets during training only. It may have the following fields:

    • gt_boxes: the bounding box of each instance.
    • gt_classes: the label for each instance with a category ranging in [0, #class].
    • gt_masks: PolygonMasks or BitMasks, the ground-truth masks of each instance.
    • gt_keypoints: NxKx3, the groud-truth keypoints for each instance.
Returns:

results (list[Instances]) – length N list of `Instances`s containing the detected instances. Returned during inference only; may be [] during training.

losses (dict[str->Tensor]): mapping from a named loss to a tensor storing the loss. Used during training only.

class detectron2.modeling.StandardROIHeads(cfg, input_shape)[source]

Bases: detectron2.modeling.roi_heads.roi_heads.ROIHeads

It’s “standard” in a sense that there is no ROI transform sharing or feature sharing between tasks. The cropped rois go to separate branches (boxes and masks) directly. This way, it is easier to make separate abstractions for different branches.

This class is used by most models, such as FPN and C5. To implement more models, you can subclass it and implement a different forward() or a head.

forward(images, features, proposals, targets=None)[source]

See ROIHeads.forward.

forward_with_given_boxes(features, instances)[source]

Use the given boxes in instances to produce other (non-box) per-ROI outputs.

This is useful for downstream tasks where a box is known, but need to obtain other attributes (outputs of other heads). Test-time augmentation also uses this.

Parameters:
  • features – same as in forward()
  • instances (list[Instances]) – instances to predict other outputs. Expect the keys “pred_boxes” and “pred_classes” to exist.
Returns:

instances (Instances) – the same Instances object, with extra fields such as pred_masks or pred_keypoints.

detectron2.modeling.build_box_head(cfg, input_shape)[source]

Build a box head defined by cfg.MODEL.ROI_BOX_HEAD.NAME.

detectron2.modeling.build_keypoint_head(cfg, input_shape)[source]

Build a keypoint head from cfg.MODEL.ROI_KEYPOINT_HEAD.NAME.

detectron2.modeling.build_mask_head(cfg, input_shape)[source]

Build a mask head defined by cfg.MODEL.ROI_MASK_HEAD.NAME.

detectron2.modeling.build_roi_heads(cfg, input_shape)[source]

Build ROIHeads defined by cfg.MODEL.ROI_HEADS.NAME.

class detectron2.modeling.DatasetMapperTTA(cfg)[source]

Bases: object

Implement test-time augmentation for detection data. It is a callable which takes a dataset dict from a detection dataset, and returns a list of dataset dicts where the images are augmented from the input image by the transformations defined in the config. This is used for test-time augmentation.

__call__(dataset_dict)[source]
Parameters:dict – a detection dataset dict
Returns:list[dict] – a list of dataset dicts, which contain augmented version of the input image. The total number of dicts is len(min_sizes) * (2 if flip else 1).
class detectron2.modeling.GeneralizedRCNNWithTTA(cfg, model, tta_mapper=None, batch_size=3)[source]

Bases: torch.nn.modules.module.Module

A GeneralizedRCNN with test-time augmentation enabled. Its __call__() method has the same interface as GeneralizedRCNN.forward().

__init__(cfg, model, tta_mapper=None, batch_size=3)[source]
Parameters:
  • cfg (CfgNode) –
  • model (GeneralizedRCNN) – a GeneralizedRCNN to apply TTA on.
  • tta_mapper (callable) – takes a dataset dict and returns a list of augmented versions of the dataset dict. Defaults to DatasetMapperTTA(cfg).
  • batch_size (int) – batch the augmented images into this batch size for inference.
__call__(batched_inputs)[source]

Same input/output format as GeneralizedRCNN.forward()

detectron2.modeilng.poolers module

class detectron2.modeling.poolers.ROIPooler(output_size, scales, sampling_ratio, pooler_type, canonical_box_size=224, canonical_level=4)[source]

Bases: torch.nn.modules.module.Module

Region of interest feature map pooler that supports pooling from one or more feature maps.

__init__(output_size, scales, sampling_ratio, pooler_type, canonical_box_size=224, canonical_level=4)[source]
Parameters:
  • output_size (int, tuple[int] or list[int]) – output size of the pooled region, e.g., 14 x 14. If tuple or list is given, the length must be 2.
  • scales (list[float]) – The scale for each low-level pooling op relative to the input image. For a feature map with stride s relative to the input image, scale is defined as a 1 / s. The stride must be power of 2. When there are multiple scales, they must form a pyramid, i.e. they must be a monotically decreasing geometric sequence with a factor of 1/2.
  • sampling_ratio (int) – The sampling_ratio parameter for the ROIAlign op.
  • pooler_type (string) – Name of the type of pooling operation that should be applied. For instance, “ROIPool” or “ROIAlignV2”.
  • canonical_box_size (int) – A canonical box size in pixels (sqrt(box area)). The default is heuristically defined as 224 pixels in the FPN paper (based on ImageNet pre-training).
  • canonical_level (int) –

    The feature map level index from which a canonically-sized box should be placed. The default is defined as level 4 (stride=16) in the FPN paper, i.e., a box of size 224x224 will be placed on the feature with stride=16. The box placement for all boxes will be determined from their sizes w.r.t canonical_box_size. For example, a box whose area is 4x that of a canonical box should be used to pool features from feature level canonical_level+1.

    Note that the actual input feature maps given to this module may not have sufficiently many levels for the input boxes. If the boxes are too large or too small for the input feature maps, the closest level will be used.

forward(x, box_lists)[source]
Parameters:
  • x (list[Tensor]) – A list of feature maps of NCHW shape, with scales matching those used to construct this module.
  • box_lists (list[Boxes] | list[RotatedBoxes]) – A list of N Boxes or N RotatedBoxes, where N is the number of images in the batch. The box coordinates are defined on the original image and will be scaled by the scales argument of ROIPooler.
Returns:

Tensor – A tensor of shape (M, C, output_size, output_size) where M is the total number of boxes aggregated over all N batch images and C is the number of channels in x.

detectron2.modeilng.sampling module

detectron2.modeling.sampling.subsample_labels(labels, num_samples, positive_fraction, bg_label)[source]

Return num_samples (or fewer, if not enough found) random samples from labels which is a mixture of positives & negatives. It will try to return as many positives as possible without exceeding positive_fraction * num_samples, and then try to fill the remaining slots with negatives.

Parameters:
  • labels (Tensor) – (N, ) label vector with values: * -1: ignore * bg_label: background (“negative”) class * otherwise: one or more foreground (“positive”) classes
  • num_samples (int) – The total number of labels with value >= 0 to return. Values that are not sampled will be filled with -1 (ignore).
  • positive_fraction (float) – The number of subsampled labels with values > 0 is min(num_positives, int(positive_fraction * num_samples)). The number of negatives sampled is min(num_negatives, num_samples - num_positives_sampled). In order words, if there are not enough positives, the sample is filled with negatives. If there are also not enough negatives, then as many elements are sampled as is possible.
  • bg_label (int) – label index of background (“negative”) class.
Returns:

pos_idx, neg_idx (Tensor) – 1D vector of indices. The total length of both is num_samples or fewer.

detectron2.modeilng.box_regression module

class detectron2.modeling.box_regression.Box2BoxTransform(weights, scale_clamp=4.135166556742356)[source]

Bases: object

The box-to-box transform defined in R-CNN. The transformation is parameterized by 4 deltas: (dx, dy, dw, dh). The transformation scales the box’s width and height by exp(dw), exp(dh) and shifts a box’s center by the offset (dx * width, dy * height).

__init__(weights, scale_clamp=4.135166556742356)[source]
Parameters:
  • weights (4-element tuple) – Scaling factors that are applied to the (dx, dy, dw, dh) deltas. In Fast R-CNN, these were originally set such that the deltas have unit variance; now they are treated as hyperparameters of the system.
  • scale_clamp (float) – When predicting deltas, the predicted box scaling factors (dw and dh) are clamped such that they are <= scale_clamp.
get_deltas(src_boxes, target_boxes)[source]

Get box regression transformation deltas (dx, dy, dw, dh) that can be used to transform the src_boxes into the target_boxes. That is, the relation target_boxes == self.apply_deltas(deltas, src_boxes) is true (unless any delta is too large and is clamped).

Parameters:
  • src_boxes (Tensor) – source boxes, e.g., object proposals
  • target_boxes (Tensor) – target of the transformation, e.g., ground-truth boxes.
apply_deltas(deltas, boxes)[source]

Apply transformation deltas (dx, dy, dw, dh) to boxes.

Parameters:
  • deltas (Tensor) – transformation deltas of shape (N, k*4), where k >= 1. deltas[i] represents k potentially different class-specific box transformations for the single box boxes[i].
  • boxes (Tensor) – boxes to transform, of shape (N, 4)
class detectron2.modeling.box_regression.Box2BoxTransformRotated(weights, scale_clamp=4.135166556742356)[source]

Bases: object

The box-to-box transform defined in Rotated R-CNN. The transformation is parameterized by 5 deltas: (dx, dy, dw, dh, da). The transformation scales the box’s width and height by exp(dw), exp(dh), shifts a box’s center by the offset (dx * width, dy * height), and rotate a box’s angle by da (radians). Note: angles of deltas are in radians while angles of boxes are in degrees.

__init__(weights, scale_clamp=4.135166556742356)[source]
Parameters:
  • weights (5-element tuple) – Scaling factors that are applied to the (dx, dy, dw, dh, da) deltas. These are treated as hyperparameters of the system.
  • scale_clamp (float) – When predicting deltas, the predicted box scaling factors (dw and dh) are clamped such that they are <= scale_clamp.
get_deltas(src_boxes, target_boxes)[source]

Get box regression transformation deltas (dx, dy, dw, dh, da) that can be used to transform the src_boxes into the target_boxes. That is, the relation target_boxes == self.apply_deltas(deltas, src_boxes) is true (unless any delta is too large and is clamped).

Parameters:
  • src_boxes (Tensor) – Nx5 source boxes, e.g., object proposals
  • target_boxes (Tensor) – Nx5 target of the transformation, e.g., ground-truth boxes.
apply_deltas(deltas, boxes)[source]

Apply transformation deltas (dx, dy, dw, dh, da) to boxes.

Parameters:
  • deltas (Tensor) – transformation deltas of shape (N, 5). deltas[i] represents box transformation for the single box boxes[i].
  • boxes (Tensor) – boxes to transform, of shape (N, 5)

Model Registries

These are different registries provided in modeling. Each registry provide you the ability to replace it with your customized component, without having to modify detectron2’s code.

Note that it is impossible to allow users to customize any line of code directly. Even just to add one line at some place, you’ll likely need to find out the smallest registry which contains that line, and register your component to that registry.

detectron2.modeling.META_ARCH_REGISTRY = <fvcore.common.registry.Registry object>

Registry for meta-architectures, i.e. the whole model.

The registered object will be called with obj(cfg) and expected to return a nn.Module object.

detectron2.modeling.BACKBONE_REGISTRY = <fvcore.common.registry.Registry object>

Registry for backbones, which extract feature maps from images

The registered object must be a callable that accepts two arguments:

  1. A detectron2.config.CfgNode
  2. A detectron2.layers.ShapeSpec, which contains the input shape specification.

It must returns an instance of Backbone.

detectron2.modeling.PROPOSAL_GENERATOR_REGISTRY = <fvcore.common.registry.Registry object>

Registry for proposal generator, which produces object proposals from feature maps.

The registered object will be called with obj(cfg, input_shape). The call should return a nn.Module object.

detectron2.modeling.ROI_HEADS_REGISTRY = <fvcore.common.registry.Registry object>

Registry for ROI heads in a generalized R-CNN model. ROIHeads take feature maps and region proposals, and perform per-region computation.

The registered object will be called with obj(cfg, input_shape). The call is expected to return an ROIHeads.

detectron2.modeling.ROI_BOX_HEAD_REGISTRY = <fvcore.common.registry.Registry object>

Registry for box heads, which make box predictions from per-region features.

The registered object will be called with obj(cfg, input_shape).

detectron2.modeling.ROI_MASK_HEAD_REGISTRY = <fvcore.common.registry.Registry object>

Registry for mask heads, which predicts instance masks given per-region features.

The registered object will be called with obj(cfg, input_shape).

detectron2.modeling.ROI_KEYPOINT_HEAD_REGISTRY = <fvcore.common.registry.Registry object>

Registry for keypoint heads, which make keypoint predictions from per-region features.

The registered object will be called with obj(cfg, input_shape).