Synthetic Data Generation with Omniverse Replicator: Full Workflow

In AI development, the quality and diversity of datasets directly impact model performance. This is especially true in domains like 3D vision, robotic perception, and autonomous systems—where training data needs to reflect the complexity of real-world environments.

Omniverse Replicator, developed by NVIDIA, addresses this need by enabling the creation of large-scale, photorealistic synthetic datasets for AI training. In this post, we’ll walk through how to generate annotated 3D image datasets using Omniverse Replicator—from scene setup to randomized data generation.

What is Omniverse Replicator?

Omniverse Replicator is a synthetic data generation tool within the NVIDIA Omniverse ecosystem. It allows users to simulate real-world variability through randomized lighting, materials, camera perspectives, and object placement—making it ideal for AI training pipelines.

Official Docs: Omniverse Replicator Extension

With Replicator, you can generate not only realistic RGB images but also corresponding annotations like bounding boxes, segmentation maps, depth, and semantic labels, all in a single pipeline.

Environment Setup: Launching Omniverse Code

To use Replicator, you'll need to install and launch Omniverse Code via the Omniverse Launcher.

Within Omniverse Code, you can build 3D scenes and customize synthetic data generation logic using Python extensions.

To begin coding custom extensions, you can connect Visual Studio Code (VSCode) to the Omniverse environment.

Semantic Labeling: Assigning Classes to Objects

A key part of synthetic data generation is semantic annotation.

Using the Semantics Schema Editor extension in Omniverse Code, you can assign class labels to 3D objects in your scene.

For example:

Semantics Type: class
Semantics Data: plastic_folding_box_2, metal_box, wooden_cube, etc.

Each object is identified by its Prim path, allowing Replicator to auto-generate annotations during rendering.

Learn more: Semantics Schema Editor Guide

Setting Up Cameras and Lighting

You can freely place multiple cameras and configure lighting to simulate different viewpoints and environmental conditions.

Multiple cameras can be used to generate diverse perspectives per object.
Docs: Camera Setup Examples

Defining Randomizers for Object Variation

To ensure dataset diversity, you can define randomizer functions that change the position, rotation, and color of scene objects.

You can also randomize lighting, dome textures, camera angles, and more.

Docs: Randomizer Examples

Generating and Saving the Dataset

Finally, you define how many frames to render and where to save the output images and labels.

This code generates 100 RGB images with corresponding annotations in the specified output directory.

Visualizing the Output

Once the generation process is complete, you can browse the saved .png images and verify annotations such as bounding boxes and segmentation labels. These synthetic datasets can now be directly used to train or validate AI models.

Final Thoughts

Omniverse Replicator is a powerful tool for creating large, realistic, and fully annotated synthetic datasets. It eliminates the need for manual image collection and labeling, making it a great asset for any AI team working on 3D perception, robotics, or simulation-based training.

From lighting to object behavior and semantic tagging, everything can be automated within the same environment. This enables rapid iteration, reproducibility, and scalable dataset creation for high-performance AI development.