Extend Detectron2’s Defaults

Research is about doing things in new ways. This brings a tension in how to create abstractions in code, which is a challenge for any research engineering project of a significant size:

  1. On one hand, it needs to have very thin abstractions to allow for the possibility of doing everything in new ways. It should be reasonably easy to break existing abstractions and replace them with new ones.

  2. On the other hand, such a project also needs reasonably high-level abstractions, so that users can easily do things in standard ways, without worrying too much about the details that only certain researchers care about.

In detectron2, there are two types of interfaces that address this tension together:

  1. Functions and classes that take a config (cfg) argument created from a yaml file (sometimes with few extra arguments).

    Such functions and classes implement the “standard default” behavior: it will read what it needs from a given config and do the “standard” thing. Users only need to load an expert-made config and pass it around, without having to worry about which arguments are used and what they all mean.

    See Yacs Configs for a detailed tutorial.

  2. Functions and classes that have well-defined explicit arguments.

    Each of these is a small building block of the entire system. They require users’ expertise to understand what each argument should be, and require more effort to stitch together to a larger system. But they can be stitched together in more flexible ways.

    When you need to implement something not supported by the “standard defaults” included in detectron2, these well-defined components can be reused.

    The LazyConfig system relies on such functions and classes.

  3. A few functions and classes are implemented with the @configurable decorator - they can be called with either a config, or with explicit arguments, or a mixture of both. Their explicit argument interfaces are currently experimental.

    As an example, a Mask R-CNN model can be built in the following ways:

    1. Config-only:

      # load proper yaml config file, then
      model = build_model(cfg)
      
    2. Mixture of config and additional argument overrides:

      model = GeneralizedRCNN(
        cfg,
        roi_heads=StandardROIHeads(cfg, batch_size_per_image=666),
        pixel_std=[57.0, 57.0, 57.0])
      
    3. Full explicit arguments:

    (click to expand)
    model = GeneralizedRCNN(
        backbone=FPN(
            ResNet(
                BasicStem(3, 64, norm="FrozenBN"),
                ResNet.make_default_stages(50, stride_in_1x1=True, norm="FrozenBN"),
                out_features=["res2", "res3", "res4", "res5"],
            ).freeze(2),
            ["res2", "res3", "res4", "res5"],
            256,
            top_block=LastLevelMaxPool(),
        ),
        proposal_generator=RPN(
            in_features=["p2", "p3", "p4", "p5", "p6"],
            head=StandardRPNHead(in_channels=256, num_anchors=3),
            anchor_generator=DefaultAnchorGenerator(
                sizes=[[32], [64], [128], [256], [512]],
                aspect_ratios=[0.5, 1.0, 2.0],
                strides=[4, 8, 16, 32, 64],
                offset=0.0,
            ),
            anchor_matcher=Matcher([0.3, 0.7], [0, -1, 1], allow_low_quality_matches=True),
            box2box_transform=Box2BoxTransform([1.0, 1.0, 1.0, 1.0]),
            batch_size_per_image=256,
            positive_fraction=0.5,
            pre_nms_topk=(2000, 1000),
            post_nms_topk=(1000, 1000),
            nms_thresh=0.7,
        ),
        roi_heads=StandardROIHeads(
            num_classes=80,
            batch_size_per_image=512,
            positive_fraction=0.25,
            proposal_matcher=Matcher([0.5], [0, 1], allow_low_quality_matches=False),
            box_in_features=["p2", "p3", "p4", "p5"],
            box_pooler=ROIPooler(7, (1.0 / 4, 1.0 / 8, 1.0 / 16, 1.0 / 32), 0, "ROIAlignV2"),
            box_head=FastRCNNConvFCHead(
                ShapeSpec(channels=256, height=7, width=7), conv_dims=[], fc_dims=[1024, 1024]
            ),
            box_predictor=FastRCNNOutputLayers(
                ShapeSpec(channels=1024),
                test_score_thresh=0.05,
                box2box_transform=Box2BoxTransform((10, 10, 5, 5)),
                num_classes=80,
            ),
            mask_in_features=["p2", "p3", "p4", "p5"],
            mask_pooler=ROIPooler(14, (1.0 / 4, 1.0 / 8, 1.0 / 16, 1.0 / 32), 0, "ROIAlignV2"),
            mask_head=MaskRCNNConvUpsampleHead(
                ShapeSpec(channels=256, width=14, height=14),
                num_classes=80,
                conv_dims=[256, 256, 256, 256, 256],
            ),
        ),
        pixel_mean=[103.530, 116.280, 123.675],
        pixel_std=[1.0, 1.0, 1.0],
        input_format="BGR",
    )
    

If you only need the standard behavior, the Beginner’s Tutorial should suffice. If you need to extend detectron2 to your own needs, see the following tutorials for more details:

  • Detectron2 includes a few standard datasets. To use custom ones, see Use Custom Datasets.

  • Detectron2 contains the standard logic that creates a data loader for training/testing from a dataset, but you can write your own as well. See Use Custom Data Loaders.

  • Detectron2 implements many standard detection models, and provide ways for you to overwrite their behaviors. See Use Models and Write Models.

  • Detectron2 provides a default training loop that is good for common training tasks. You can customize it with hooks, or write your own loop instead. See training.