# detectron2.modeling package¶

detectron2.modeling.build_anchor_generator(cfg, input_shape)[source]

Built an anchor generator from cfg.MODEL.ANCHOR_GENERATOR.NAME.

class detectron2.modeling.FPN(bottom_up, in_features, out_channels, norm='', top_block=None, fuse_type='sum')[source]

Bases: detectron2.modeling.backbone.backbone.Backbone

This module implements Feature Pyramid Networks for Object Detection. It creates pyramid features built on top of some input feature maps.

__init__(bottom_up, in_features, out_channels, norm='', top_block=None, fuse_type='sum')[source]
Parameters
• bottom_up (Backbone) – module representing the bottom up subnetwork. Must be a subclass of Backbone. The multi-scale feature maps generated by the bottom up network, and listed in in_features, are used to generate FPN levels.

• in_features (list[str]) – names of the input feature maps coming from the backbone to which FPN is attached. For example, if the backbone produces [“res2”, “res3”, “res4”], any contiguous sublist of these may be used; order must be from high to low resolution.

• out_channels (int) – number of channels in the output feature maps.

• norm (str) – the normalization to use.

• top_block (nn.Module or None) – if provided, an extra operation will be performed on the output of the last (smallest resolution) FPN output, and the result will extend the result list. The top_block further downsamples the feature map. It must have an attribute “num_levels”, meaning the number of extra FPN levels added by this block, and “in_feature”, which is a string representing its input feature (e.g., p5).

• fuse_type (str) – types for fusing the top down features and the lateral ones. It can be “sum” (default), which sums up element-wise; or “avg”, which takes the element-wise mean of the two.

property size_divisibility
forward(x)[source]
Parameters

input (dict[str->Tensor]) – mapping feature map name (e.g., “res5”) to feature map tensor for each feature level in high to low resolution order.

Returns

dict[str->Tensor] – mapping from feature map name to FPN feature map tensor in high to low resolution order. Returned feature names follow the FPN paper convention: “p<stage>”, where stage has stride = 2 ** stage e.g., [“p2”, “p3”, …, “p6”].

output_shape()[source]
class detectron2.modeling.Backbone[source]

Bases: torch.nn.modules.module.Module

Abstract base class for network backbones.

__init__()[source]

The __init__ method of any subclass can specify its own set of arguments.

abstract forward()[source]

Subclasses must override this method, but adhere to the same return type.

Returns

dict[str->Tensor] – mapping from feature name (e.g., “res2”) to tensor

property size_divisibility

Some backbones require the input height and width to be divisible by a specific integer. This is typically true for encoder / decoder type networks with lateral connection (e.g., FPN) for which feature maps need to match dimension in the “bottom up” and “top down” paths. Set to 0 if no specific input size divisibility is required.

output_shape()[source]
Returns

dict[str->ShapeSpec]

class detectron2.modeling.ResNet(stem, stages, num_classes=None, out_features=None)[source]

Bases: detectron2.modeling.backbone.backbone.Backbone

__init__(stem, stages, num_classes=None, out_features=None)[source]
Parameters
• stem (nn.Module) – a stem module

• stages (list[list[CNNBlockBase]]) – several (typically 4) stages, each contains multiple CNNBlockBase.

• num_classes (None or int) – if None, will not perform classification. Otherwise, will create a linear layer.

• out_features (list[str]) – name of the layers whose outputs should be returned in forward. Can be anything in “stem”, “linear”, or “res2” … If None, will return the output of the last layer.

forward(x)[source]
output_shape()[source]
freeze(freeze_at=0)[source]

Freeze the first several stages of the ResNet. Commonly used in fine-tuning.

Layers that produce the same feature map spatial size are defined as one “stage” by Feature Pyramid Networks for Object Detection.

Parameters

freeze_at (int) – number of stages to freeze. 1 means freezing the stem. 2 means freezing the stem and one residual stage, etc.

Returns

nn.Module – this ResNet itself

detectron2.modeling.ResNetBlockBase

alias of detectron2.layers.blocks.CNNBlockBase

detectron2.modeling.build_backbone(cfg, input_shape=None)[source]

Build a backbone from cfg.MODEL.BACKBONE.NAME.

Returns

an instance of Backbone

detectron2.modeling.build_resnet_backbone(cfg, input_shape)[source]

Create a ResNet instance from config.

Returns

ResNet – a ResNet instance.

detectron2.modeling.make_stage(block_class, num_blocks, first_stride, *, in_channels=None, out_channels=None, **kwargs)[source]

Create a list of blocks just like those in a ResNet stage.

Parameters
• block_class (type) – a subclass of ResNetBlockBase

• num_blocks (int) –

• first_stride (int) – the stride of the first block. The other blocks will have stride=1.

• in_channels (int) – input channels of the entire stage.

• out_channels (int) – output channels of every block in the stage.

• kwargs – other arguments passed to the constructor of every block.

Returns

list[nn.Module] – a list of block module.

class detectron2.modeling.GeneralizedRCNN(cfg)[source]

Bases: torch.nn.modules.module.Module

Generalized R-CNN. Any models that contains the following three components: 1. Per-image feature extraction (aka backbone) 2. Region proposal generation 3. Per-region feature extraction and prediction

property device
visualize_training(batched_inputs, proposals)[source]

A function used to visualize images and proposals. It shows ground truth bounding boxes on the original image and up to 20 predicted object proposals on the original image. Users can implement different visualization functions for different models.

Parameters
• batched_inputs (list) – a list that contains input to the model.

• proposals (list) – a list that contains predicted proposals. Both batched_inputs and proposals should have the same length.

forward(batched_inputs)[source]
Parameters

batched_inputs

a list, batched outputs of DatasetMapper . Each item in the list contains the inputs for one image. For now, each item in the list is a dict that contains:

• image: Tensor, image in (C, H, W) format.

• instances (optional): groundtruth Instances

• proposals (optional): Instances, precomputed proposals.

Other information that’s included in the original dicts, such as:

• ”height”, “width” (int): the output resolution of the model, used in inference. See postprocess() for details.

Returns

list[dict] – Each dict is the output for one input image. The dict contains one key “instances” whose value is a Instances. The Instances object has the following keys: “pred_boxes”, “pred_classes”, “scores”, “pred_masks”, “pred_keypoints”

inference(batched_inputs, detected_instances=None, do_postprocess=True)[source]

Run inference on the given inputs.

Parameters
• batched_inputs (list[dict]) – same as in forward()

• detected_instances (None or list[Instances]) – if not None, it contains an Instances object per image. The Instances object contains “pred_boxes” and “pred_classes” which are known boxes in the image. The inference will then skip the detection of bounding boxes, and only predict other per-ROI outputs.

• do_postprocess (bool) – whether to apply post-processing on the outputs.

Returns

same as in forward().

preprocess_image(batched_inputs)[source]

Normalize, pad and batch the input images.

class detectron2.modeling.PanopticFPN(cfg)[source]

Bases: torch.nn.modules.module.Module

Implement the paper Panoptic Feature Pyramid Networks.

property device
forward(batched_inputs)[source]
Parameters

batched_inputs

a list, batched outputs of DatasetMapper. Each item in the list contains the inputs for one image.

For now, each item in the list is a dict that contains:

• ”image”: Tensor, image in (C, H, W) format.

• ”instances”: Instances

• ”sem_seg”: semantic segmentation ground truth.

• Other information that’s included in the original dicts, such as: “height”, “width” (int): the output resolution of the model, used in inference. See postprocess() for details.

Returns

list[dict] – each dict is the results for one image. The dict contains the following keys:

class detectron2.modeling.ProposalNetwork(cfg)[source]

Bases: torch.nn.modules.module.Module

A meta architecture that only predicts object proposals.

property device
forward(batched_inputs)[source]

:param Same as in GeneralizedRCNN.forward:

Returns

list[dict] – Each dict is the output for one input image. The dict contains one key “proposals” whose value is a Instances with keys “proposal_boxes” and “objectness_logits”.

class detectron2.modeling.RetinaNet(cfg)[source]

Bases: torch.nn.modules.module.Module

Implement RetinaNet in Focal Loss for Dense Object Detection.

property device
visualize_training(batched_inputs, results)[source]

A function used to visualize ground truth images and final network predictions. It shows ground truth bounding boxes on the original image and up to 20 predicted object bounding boxes on the original image.

Parameters
• batched_inputs (list) – a list that contains input to the model.

• results (List[Instances]) – a list of #images elements.

forward(batched_inputs)[source]
Parameters

batched_inputs

a list, batched outputs of DatasetMapper . Each item in the list contains the inputs for one image. For now, each item in the list is a dict that contains:

• image: Tensor, image in (C, H, W) format.

• instances: Instances

Other information that’s included in the original dicts, such as:

• ”height”, “width” (int): the output resolution of the model, used in inference. See postprocess() for details.

Returns

dict[str

Tensor]:

mapping from a named loss to a tensor storing the loss. Used during training only.

losses(gt_classes, gt_anchors_deltas, pred_class_logits, pred_anchor_deltas)[source]
Parameters
• gt_classes and gt_anchors_deltas parameters, see (For) – RetinaNet.get_ground_truth().

• shapes are (Their) –

• total number of anchors across levels, i.e. sum (the) –

• pred_class_logits and pred_anchor_deltas, see (For) – RetinaNetHead.forward().

Returns

dict[str, Tensor] – mapping from a named loss to a scalar tensor storing the loss. Used during training only. The dict keys are: “loss_cls” and “loss_box_reg”

get_ground_truth(anchors, targets)[source]
Parameters
• anchors (list[Boxes]) – A list of #feature level Boxes. The Boxes contains anchors of this image on the specific feature level.

• targets (list[Instances]) – a list of N Instancess. The i-th Instances contains the ground-truth per-instance annotations for the i-th input image. Specify targets during training only.

Returns

gt_classes (Tensor) – An integer tensor of shape (N, R) storing ground-truth labels for each anchor.

R is the total number of anchors, i.e. the sum of Hi x Wi x A for all levels. Anchors with an IoU with some target higher than the foreground threshold are assigned their corresponding label in the [0, K-1] range. Anchors whose IoU are below the background threshold are assigned the label “K”. Anchors whose IoU are between the foreground and background thresholds are assigned a label “-1”, i.e. ignore.

gt_anchors_deltas (Tensor):

Shape (N, R, 4). The last dimension represents ground-truth box2box transform targets (dx, dy, dw, dh) that map each anchor to its matched ground-truth box. The values in the tensor are meaningful only when the corresponding anchor is labeled as foreground.

inference(box_cls, box_delta, anchors, image_sizes)[source]
Parameters
• box_delta (box_cls,) – Same as the output of RetinaNetHead.forward()

• anchors (list[Boxes]) – A list of #feature level Boxes. The Boxes contain anchors of this image on the specific feature level.

• image_sizes (List[torch.Size]) – the input image sizes

Returns

results (List[Instances]) – a list of #images elements.

inference_single_image(box_cls, box_delta, anchors, image_size)[source]

Single-image inference. Return bounding-box detection results by thresholding on scores and applying non-maximum suppression (NMS).

Parameters
• box_cls (list[Tensor]) – list of #feature levels. Each entry contains tensor of size (H x W x A, K)

• box_delta (list[Tensor]) – Same shape as ‘box_cls’ except that K becomes 4.

• anchors (list[Boxes]) – list of #feature levels. Each entry contains a Boxes object, which contains all the anchors for that image in that feature level.

• image_size (tuple(H, W)) – a tuple of the image height and width.

Returns

Same as inference, but for only one image.

preprocess_image(batched_inputs)[source]

Normalize, pad and batch the input images.

class detectron2.modeling.SemanticSegmentor(cfg)[source]

Bases: torch.nn.modules.module.Module

Main class for semantic segmentation architectures.

property device
forward(batched_inputs)[source]
Parameters

batched_inputs

a list, batched outputs of DatasetMapper. Each item in the list contains the inputs for one image.

For now, each item in the list is a dict that contains:

• ”image”: Tensor, image in (C, H, W) format.

• ”sem_seg”: semantic segmentation ground truth

• Other information that’s included in the original dicts, such as: “height”, “width” (int): the output resolution of the model, used in inference. See postprocess() for details.

Returns

list[dict] – Each dict is the output for one input image. The dict contains one key “sem_seg” whose value is a Tensor that represents the per-pixel segmentation prediced by the head. The prediction has shape KxHxW that represents the logits of each class for each pixel.

detectron2.modeling.build_model(cfg)[source]

Build the whole model architecture, defined by cfg.MODEL.META_ARCHITECTURE. Note that it does not load any weights from cfg.

detectron2.modeling.build_sem_seg_head(cfg, input_shape)[source]

Build a semantic segmentation head from cfg.MODEL.SEM_SEG_HEAD.NAME.

detectron2.modeling.detector_postprocess(results, output_height, output_width, mask_threshold=0.5)[source]

Resize the output instances. The input images are often resized when entering an object detector. As a result, we often need the outputs of the detector in a different resolution from its inputs.

This function will resize the raw outputs of an R-CNN detector to produce outputs according to the desired output resolution.

Parameters
• results (Instances) – the raw outputs from the detector. results.image_size contains the input image resolution the detector sees. This object might be modified in-place.

• output_width (output_height,) – the desired output resolution.

Returns

Instances – the resized output from the model, based on the output resolution

detectron2.modeling.build_proposal_generator(cfg, input_shape)[source]

Build a proposal generator from cfg.MODEL.PROPOSAL_GENERATOR.NAME. The name can be “PrecomputedProposals” to use no proposal generator.

detectron2.modeling.build_rpn_head(cfg, input_shape)[source]

Build an RPN head defined by cfg.MODEL.RPN.HEAD_NAME.

class detectron2.modeling.ROIHeads(*, num_classes=None, batch_size_per_image=None, positive_sample_fraction=None, proposal_matcher=None, proposal_append_gt=True)[source]

Bases: torch.nn.modules.module.Module

ROIHeads perform all per-region computation in an R-CNN.

It typically contains logic to 1. (in training only) match proposals with ground truth and sample them 2. crop the regions and extract per-region features using proposals 3. make per-region predictions with different heads

It can have many variants, implemented as subclasses of this class. This base class contains the logic to match/sample proposals. But it is not necessary to inherit this class if the sampling logic is not needed.

classmethod from_config(cfg)[source]
label_and_sample_proposals(proposals: List[detectron2.structures.instances.Instances], targets: List[detectron2.structures.instances.Instances]) → List[detectron2.structures.instances.Instances][source]

Prepare some proposals to be used to train the ROI heads. It performs box matching between proposals and targets, and assigns training labels to the proposals. It returns self.batch_size_per_image random samples from proposals and groundtruth boxes, with a fraction of positives that is no larger than self.positive_sample_fraction.

:param See ROIHeads.forward():

Returns

list[Instances] – length N list of Instancess containing the proposals sampled for training. Each Instances has the following fields:

• proposal_boxes: the proposal boxes

• gt_boxes: the ground-truth box that the proposal is assigned to (this is only meaningful if the proposal has a label > 0; if label = 0 then the ground-truth box is random)

Other fields such as “gt_classes”, “gt_masks”, that’s included in targets.

forward(images: detectron2.structures.image_list.ImageList, features: Dict[str, torch.Tensor], proposals: List[detectron2.structures.instances.Instances], targets: Optional[List[detectron2.structures.instances.Instances]] = None) → Tuple[List[detectron2.structures.instances.Instances], Dict[str, torch.Tensor]][source]
Parameters
• images (ImageList) –

• features (dict[str,Tensor]) – input data as a mapping from feature map name to tensor. Axis 0 represents the number of images N in the input data; axes 1-3 are channels, height, and width, which may vary between feature maps (e.g., if a feature pyramid is used).

• proposals (list[Instances]) – length N list of Instances. The i-th Instances contains object proposals for the i-th input image, with fields “proposal_boxes” and “objectness_logits”.

• targets (list[Instances], optional) –

length N list of Instances. The i-th Instances contains the ground-truth per-instance annotations for the i-th input image. Specify targets during training only. It may have the following fields:

• gt_boxes: the bounding box of each instance.

• gt_classes: the label for each instance with a category ranging in [0, #class].

• gt_keypoints: NxKx3, the groud-truth keypoints for each instance.

Returns

list[Instances] – length N list of Instances containing the detected instances. Returned during inference only; may be [] during training.

dict[str->Tensor]: mapping from a named loss to a tensor storing the loss. Used during training only.

class detectron2.modeling.StandardROIHeads(*, box_in_features: List[str] = None, box_pooler: detectron2.modeling.poolers.ROIPooler = None, box_head: torch.nn.modules.module.Module = None, box_predictor: torch.nn.modules.module.Module = None, mask_in_features: Optional[List[str]] = None, mask_pooler: Optional[detectron2.modeling.poolers.ROIPooler] = None, mask_head: Optional[torch.nn.modules.module.Module] = None, keypoint_in_features: Optional[List[str]] = None, keypoint_pooler: Optional[detectron2.modeling.poolers.ROIPooler] = None, keypoint_head: Optional[torch.nn.modules.module.Module] = None, train_on_pred_boxes: bool = False, **kwargs)[source]

Bases: detectron2.modeling.roi_heads.roi_heads.ROIHeads

It’s “standard” in a sense that there is no ROI transform sharing or feature sharing between tasks. Each head independently processes the input features by each head’s own pooler and head.

This class is used by most models, such as FPN and C5. To implement more models, you can subclass it and implement a different forward() or a head.

classmethod from_config(cfg, input_shape)[source]
forward(images: detectron2.structures.image_list.ImageList, features: Dict[str, torch.Tensor], proposals: List[detectron2.structures.instances.Instances], targets: Optional[List[detectron2.structures.instances.Instances]] = None) → Tuple[List[detectron2.structures.instances.Instances], Dict[str, torch.Tensor]][source]
forward_with_given_boxes(features: Dict[str, torch.Tensor], instances: List[detectron2.structures.instances.Instances]) → List[detectron2.structures.instances.Instances][source]

Use the given boxes in instances to produce other (non-box) per-ROI outputs.

This is useful for downstream tasks where a box is known, but need to obtain other attributes (outputs of other heads). Test-time augmentation also uses this.

Parameters
• features – same as in forward()

• instances (list[Instances]) – instances to predict other outputs. Expect the keys “pred_boxes” and “pred_classes” to exist.

Returns

instances (list[Instances]) – the same Instances objects, with extra fields such as pred_masks or pred_keypoints.

class detectron2.modeling.BaseMaskRCNNHead(*, vis_period=0)[source]

Bases: torch.nn.modules.module.Module

Implement the basic Mask R-CNN losses and inference logic described in Mask R-CNN

classmethod from_config(cfg, input_shape)[source]
forward(x, instances: List[detectron2.structures.instances.Instances])[source]
Parameters
• x – input region feature(s) provided by ROIHeads.

• instances (list[Instances]) – contains the boxes & labels corresponding to the input features. Exact format is up to its caller to decide. Typically, this is the foreground instances in training, with “proposal_boxes” field and other gt annotations. In inference, it contains boxes that are already predicted.

Returns

A dict of losses in training. The predicted “instances” in inference.

layers(x)[source]

Neural network layers that makes predictions from input features.

class detectron2.modeling.BaseKeypointRCNNHead(*, num_keypoints=None, loss_weight=1.0, loss_normalizer=1.0)[source]

Bases: torch.nn.modules.module.Module

Implement the basic Keypoint R-CNN losses and inference logic described in Mask R-CNN.

classmethod from_config(cfg, input_shape)[source]
forward(x, instances: List[detectron2.structures.instances.Instances])[source]
Parameters
• x – input region feature(s) provided by ROIHeads.

• instances (list[Instances]) – contains the boxes & labels corresponding to the input features. Exact format is up to its caller to decide. Typically, this is the foreground instances in training, with “proposal_boxes” field and other gt annotations. In inference, it contains boxes that are already predicted.

Returns

A dict of losses if in training. The predicted “instances” if in inference.

layers(x)[source]

Neural network layers that makes predictions from regional input features.

detectron2.modeling.build_box_head(cfg, input_shape)[source]

Build a box head defined by cfg.MODEL.ROI_BOX_HEAD.NAME.

detectron2.modeling.build_keypoint_head(cfg, input_shape)[source]

detectron2.modeling.build_mask_head(cfg, input_shape)[source]

detectron2.modeling.build_roi_heads(cfg, input_shape)[source]

class detectron2.modeling.DatasetMapperTTA(cfg)[source]

Bases: object

Implement test-time augmentation for detection data. It is a callable which takes a dataset dict from a detection dataset, and returns a list of dataset dicts where the images are augmented from the input image by the transformations defined in the config. This is used for test-time augmentation.

__call__(dataset_dict)[source]
Parameters

dict – a detection dataset dict

Returns

list[dict] – a list of dataset dicts, which contain augmented version of the input image. The total number of dicts is len(min_sizes) * (2 if flip else 1).

class detectron2.modeling.GeneralizedRCNNWithTTA(cfg, model, tta_mapper=None, batch_size=3)[source]

Bases: torch.nn.modules.module.Module

A GeneralizedRCNN with test-time augmentation enabled. Its __call__() method has the same interface as GeneralizedRCNN.forward().

__init__(cfg, model, tta_mapper=None, batch_size=3)[source]
Parameters
• cfg (CfgNode) –

• model (GeneralizedRCNN) – a GeneralizedRCNN to apply TTA on.

• tta_mapper (callable) – takes a dataset dict and returns a list of augmented versions of the dataset dict. Defaults to DatasetMapperTTA(cfg).

• batch_size (int) – batch the augmented images into this batch size for inference.

__call__(batched_inputs)[source]

Same input/output format as GeneralizedRCNN.forward()

## detectron2.modeling.poolers module¶

class detectron2.modeling.poolers.ROIPooler(output_size, scales, sampling_ratio, pooler_type, canonical_box_size=224, canonical_level=4)[source]

Bases: torch.nn.modules.module.Module

Region of interest feature map pooler that supports pooling from one or more feature maps.

__init__(output_size, scales, sampling_ratio, pooler_type, canonical_box_size=224, canonical_level=4)[source]
Parameters
• output_size (int, tuple[int] or list[int]) – output size of the pooled region, e.g., 14 x 14. If tuple or list is given, the length must be 2.

• scales (list[float]) – The scale for each low-level pooling op relative to the input image. For a feature map with stride s relative to the input image, scale is defined as a 1 / s. The stride must be power of 2. When there are multiple scales, they must form a pyramid, i.e. they must be a monotically decreasing geometric sequence with a factor of 1/2.

• sampling_ratio (int) – The sampling_ratio parameter for the ROIAlign op.

• pooler_type (string) – Name of the type of pooling operation that should be applied. For instance, “ROIPool” or “ROIAlignV2”.

• canonical_box_size (int) – A canonical box size in pixels (sqrt(box area)). The default is heuristically defined as 224 pixels in the FPN paper (based on ImageNet pre-training).

• canonical_level (int) –

The feature map level index from which a canonically-sized box should be placed. The default is defined as level 4 (stride=16) in the FPN paper, i.e., a box of size 224x224 will be placed on the feature with stride=16. The box placement for all boxes will be determined from their sizes w.r.t canonical_box_size. For example, a box whose area is 4x that of a canonical box should be used to pool features from feature level canonical_level+1.

Note that the actual input feature maps given to this module may not have sufficiently many levels for the input boxes. If the boxes are too large or too small for the input feature maps, the closest level will be used.

forward(x, box_lists)[source]
Parameters
• x (list[Tensor]) – A list of feature maps of NCHW shape, with scales matching those used to construct this module.

• box_lists (list[Boxes] | list[RotatedBoxes]) – A list of N Boxes or N RotatedBoxes, where N is the number of images in the batch. The box coordinates are defined on the original image and will be scaled by the scales argument of ROIPooler.

Returns

Tensor – A tensor of shape (M, C, output_size, output_size) where M is the total number of boxes aggregated over all N batch images and C is the number of channels in x.

## detectron2.modeling.sampling module¶

detectron2.modeling.sampling.subsample_labels(labels, num_samples, positive_fraction, bg_label)[source]

Return num_samples (or fewer, if not enough found) random samples from labels which is a mixture of positives & negatives. It will try to return as many positives as possible without exceeding positive_fraction * num_samples, and then try to fill the remaining slots with negatives.

Parameters
• labels (Tensor) – (N, ) label vector with values: * -1: ignore * bg_label: background (“negative”) class * otherwise: one or more foreground (“positive”) classes

• num_samples (int) – The total number of labels with value >= 0 to return. Values that are not sampled will be filled with -1 (ignore).

• positive_fraction (float) – The number of subsampled labels with values > 0 is min(num_positives, int(positive_fraction * num_samples)). The number of negatives sampled is min(num_negatives, num_samples - num_positives_sampled). In order words, if there are not enough positives, the sample is filled with negatives. If there are also not enough negatives, then as many elements are sampled as is possible.

• bg_label (int) – label index of background (“negative”) class.

Returns

pos_idx, neg_idx (Tensor) – 1D vector of indices. The total length of both is num_samples or fewer.

## detectron2.modeling.box_regression module¶

class detectron2.modeling.box_regression.Box2BoxTransform(weights: Tuple[float, float, float, float], scale_clamp: float = 4.135166556742356)[source]

Bases: object

The box-to-box transform defined in R-CNN. The transformation is parameterized by 4 deltas: (dx, dy, dw, dh). The transformation scales the box’s width and height by exp(dw), exp(dh) and shifts a box’s center by the offset (dx * width, dy * height).

__init__(weights: Tuple[float, float, float, float], scale_clamp: float = 4.135166556742356)[source]
Parameters
• weights (4-element tuple) – Scaling factors that are applied to the (dx, dy, dw, dh) deltas. In Fast R-CNN, these were originally set such that the deltas have unit variance; now they are treated as hyperparameters of the system.

• scale_clamp (float) – When predicting deltas, the predicted box scaling factors (dw and dh) are clamped such that they are <= scale_clamp.

get_deltas(src_boxes, target_boxes)[source]

Get box regression transformation deltas (dx, dy, dw, dh) that can be used to transform the src_boxes into the target_boxes. That is, the relation target_boxes == self.apply_deltas(deltas, src_boxes) is true (unless any delta is too large and is clamped).

Parameters
• src_boxes (Tensor) – source boxes, e.g., object proposals

• target_boxes (Tensor) – target of the transformation, e.g., ground-truth boxes.

apply_deltas(deltas, boxes)[source]

Apply transformation deltas (dx, dy, dw, dh) to boxes.

Parameters
• deltas (Tensor) – transformation deltas of shape (N, k*4), where k >= 1. deltas[i] represents k potentially different class-specific box transformations for the single box boxes[i].

• boxes (Tensor) – boxes to transform, of shape (N, 4)

class detectron2.modeling.box_regression.Box2BoxTransformRotated(weights: Tuple[float, float, float, float, float], scale_clamp: float = 4.135166556742356)[source]

Bases: object

The box-to-box transform defined in Rotated R-CNN. The transformation is parameterized by 5 deltas: (dx, dy, dw, dh, da). The transformation scales the box’s width and height by exp(dw), exp(dh), shifts a box’s center by the offset (dx * width, dy * height), and rotate a box’s angle by da (radians). Note: angles of deltas are in radians while angles of boxes are in degrees.

__init__(weights: Tuple[float, float, float, float, float], scale_clamp: float = 4.135166556742356)[source]
Parameters
• weights (5-element tuple) – Scaling factors that are applied to the (dx, dy, dw, dh, da) deltas. These are treated as hyperparameters of the system.

• scale_clamp (float) – When predicting deltas, the predicted box scaling factors (dw and dh) are clamped such that they are <= scale_clamp.

get_deltas(src_boxes, target_boxes)[source]

Get box regression transformation deltas (dx, dy, dw, dh, da) that can be used to transform the src_boxes into the target_boxes. That is, the relation target_boxes == self.apply_deltas(deltas, src_boxes) is true (unless any delta is too large and is clamped).

Parameters
• src_boxes (Tensor) – Nx5 source boxes, e.g., object proposals

• target_boxes (Tensor) – Nx5 target of the transformation, e.g., ground-truth boxes.

apply_deltas(deltas, boxes)[source]

Apply transformation deltas (dx, dy, dw, dh, da) to boxes.

Parameters
• deltas (Tensor) – transformation deltas of shape (N, 5). deltas[i] represents box transformation for the single box boxes[i].

• boxes (Tensor) – boxes to transform, of shape (N, 5)

## Model Registries¶

These are different registries provided in modeling. Each registry provide you the ability to replace it with your customized component, without having to modify detectron2’s code.

Note that it is impossible to allow users to customize any line of code directly. Even just to add one line at some place, you’ll likely need to find out the smallest registry which contains that line, and register your component to that registry.

detectron2.modeling.META_ARCH_REGISTRY = <fvcore.common.registry.Registry object>

Registry for meta-architectures, i.e. the whole model.

The registered object will be called with obj(cfg) and expected to return a nn.Module object.

detectron2.modeling.BACKBONE_REGISTRY = <fvcore.common.registry.Registry object>

Registry for backbones, which extract feature maps from images

The registered object must be a callable that accepts two arguments:

1. A detectron2.layers.ShapeSpec, which contains the input shape specification.

It must returns an instance of Backbone.

detectron2.modeling.PROPOSAL_GENERATOR_REGISTRY = <fvcore.common.registry.Registry object>

Registry for proposal generator, which produces object proposals from feature maps.

The registered object will be called with obj(cfg, input_shape). The call should return a nn.Module object.

detectron2.modeling.RPN_HEAD_REGISTRY = <fvcore.common.registry.Registry object>

Registry for RPN heads, which take feature maps and perform objectness classification and bounding box regression for anchors.

The registered object will be called with obj(cfg, input_shape). The call should return a nn.Module object.

detectron2.modeling.ANCHOR_GENERATOR_REGISTRY = <fvcore.common.registry.Registry object>

Registry for modules that creates object detection anchors for feature maps.

The registered object will be called with obj(cfg, input_shape).

detectron2.modeling.ROI_HEADS_REGISTRY = <fvcore.common.registry.Registry object>

Registry for ROI heads in a generalized R-CNN model. ROIHeads take feature maps and region proposals, and perform per-region computation.

The registered object will be called with obj(cfg, input_shape). The call is expected to return an ROIHeads.

detectron2.modeling.ROI_BOX_HEAD_REGISTRY = <fvcore.common.registry.Registry object>

Registry for box heads, which make box predictions from per-region features.

The registered object will be called with obj(cfg, input_shape).

detectron2.modeling.ROI_MASK_HEAD_REGISTRY = <fvcore.common.registry.Registry object>

Registry for mask heads, which predicts instance masks given per-region features.

The registered object will be called with obj(cfg, input_shape).

detectron2.modeling.ROI_KEYPOINT_HEAD_REGISTRY = <fvcore.common.registry.Registry object>

Registry for keypoint heads, which make keypoint predictions from per-region features.

The registered object will be called with obj(cfg, input_shape).