# Sim Scene Configuration This page explains the simulation scene API in `python/rcs/envs/scenes.py` and shows how the example scene configs in `python/rcs/envs/configs.py` fit together. If you are new to the frame conventions first, read [RCS Conventions](conventions.md). ```text world frame └── root frame ├── all composed scene objects live here └── shared base frame (same kinematic node as root frame) (common robot-coordinate frame) ├── i-th robot base │ └── i-th robot attachment_site ├── j-th robot base │ └── j-th robot attachment_site └── ... ``` ## The simple mental model When building an RCS sim scene, it helps to think about five frames: 1. **World frame** - The global MuJoCo frame. - This is the outermost reference for the whole scene. 2. **Root frame** - The scene-local frame for the composed robot setup. - In the usual composed setup, all robot-scene objects are placed from here. 3. **Shared base frame** - The common coordinate frame used for all robot actions and observations. - It is attached to the same kinematic node as the root frame, but represents the common robot coordinate convention. 4. **i-th robot base frame** - The base frame of one specific robot. - Low-level kinematics and Cartesian commands are expressed here. 5. **i-th robot `attachment_site`** - The end-effector mounting frame for that robot, before any optional `tcp_offset`. - Use this for wrist-mounted objects, cameras, and tools. A good rule of thumb is: - want the outer global reference -> **world frame** - want the composed scene placement frame -> **root frame** - want one common coordinate frame for all robots -> **shared base frame** - want per-robot kinematics -> **i-th robot base frame** - want wrist or tool mounting -> **i-th robot `attachment_site`** If needed, extra `world_frame_objects` can still be placed directly in world coordinates. ## How the main placement frames work For composed robot scenes, `SimEnvCreator.create_model()` combines three scene-placement transforms in this order: ```python robot2world = root_frame_to_world * shared_base_frame_to_root_frame * robot_to_shared_base_frame[robot_name] ``` In simple terms: - `root_frame_to_world` places the whole rig into the MuJoCo world - `shared_base_frame_to_root_frame` defines the shared robot coordinate frame relative to the root frame - `robot_to_shared_base_frame` places each robot relative to that shared robot frame ### Single-robot intuition For a simple single-arm scene, all three can often be identity transforms: ```python root_frame_to_world = rcs.common.Pose() shared_base_frame_to_root_frame = rcs.common.Pose() robot_to_shared_base_frame = {"robot": rcs.common.Pose()} ``` That means the robot base, shared base, root frame, and world origin all coincide. ### Dual-arm intuition The FR3 duo example in `python/rcs/envs/configs.py` uses these frames more meaningfully: - `root_frame_to_world` keeps the whole duo rig aligned with the world - `shared_base_frame_to_root_frame` lifts the shared base to the duo mount height - `robot_to_shared_base_frame` offsets the left and right robots from the center That is why the dual-arm example can expose one common action frame while still placing each robot correctly. ## `SimEnvCreatorConfig` keys This is the main top-level config for the scene API. | Key | What it controls | Typical use | | --- | --- | --- | | `robot_cfgs` | Maps robot names to `SimRobotConfig` | Define one robot or multiple named robots such as `"left"` and `"right"` | | `sim_cfg` | MuJoCo runtime settings (`SimConfig`) | Realtime, async control, frequency, convergence behavior | | `control_mode` | Action representation | For example `ControlMode.CARTESIAN_TQuat` | | `task_cfg` | Optional task-specific config | Add pick/place or other task logic | | `scene` | Base scene XML path or scene key | Usually from `SCENE_PATHS[...]` | | `gripper_cfgs` | Optional gripper config per robot | Add one gripper per robot | | `camera_cfgs` | Optional camera config dictionary | Define resolution, type, and frame rate for named cameras | | `max_relative_movement` | Relative action limit | Limit per-step Cartesian delta | | `relative_to` | Relative action reference | Usually `RelativeTo.LAST_STEP` or `RelativeTo.NONE` | | `robot_to_shared_base_frame` | Per-robot offset relative to the shared base frame | Multi-robot layouts | | `add_gravcomp` | Add gravity compensation to the composed scene | Often useful for manipulation scenes | | `wrapper_cfg` | Wrapper behavior flags | Binary gripper, home-on-reset, depth output | | `headless` | GUI toggle | `True` for no GUI | | `shared_base_frame_to_root_frame` | Offset from shared base frame to root frame | Move shared command origin inside the rig | | `root_frame_to_world` | Offset from root frame to MuJoCo world | Place the whole setup in the room | | `alternative_combined_robot_mjcf` | Use a pre-combined robot MJCF instead of composing robots one by one | Advanced custom scenes | | `world_frame_objects` | Objects placed directly in world coordinates | Loose props, room-fixed assets | | `root_frame_objects` | Objects placed in root-frame coordinates | Tables, mounts, fixtures that should move with the rig | | `robot_frame_objects` | Objects attached in a robot attachment-site frame | Wrist mounts, end-effector payloads | | `camera_adds` | Cameras to add to the scene | Fixed overhead cameras or wrist cameras | | `gripper_offsets` | Pose offsets for mounted grippers | Align visual or tool frames | | `_original_cfg` | Internal helper used after prefixing | Usually ignore this in user code | ## What usually lives inside the nested configs The scene config mostly wires together three lower-level config types. ### `robot_cfgs: dict[str, SimRobotConfig]` Each robot entry usually defines things such as: - robot type - kinematic model path - `attachment_site` - `tcp_offset` - joint names and actuator names - base link name - degrees of freedom, joint limits, and `q_home` The single-arm and dual-arm examples in `python/rcs/envs/configs.py` are good templates for this. ### `gripper_cfgs: dict[str, SimGripperConfig]` Each gripper entry usually defines: - gripper type - gripper joint names and actuator name - min/max width or actuator range - collision geometry settings - callback timing ### `camera_cfgs: dict[str, SimCameraConfig]` Each camera entry usually defines: - camera identifier - camera type - resolution - frame rate A useful pattern is: - `camera_cfgs` defines the camera runtime properties - `camera_adds` defines where that camera is placed in the scene ## The most important nested configs ### `WrapperConfig` `WrapperConfig` controls behavior of the environment wrappers around the raw simulation. | Key | Meaning | | --- | --- | | `binary_gripper` | If `True`, gripper commands are treated as open/close instead of continuous width | | `home_on_reset` | If `True`, the robot returns home during reset | | `include_depth` | If `True`, camera wrappers include depth images. These are metric depth values scaled by `BaseCameraSet.DEPTH_SCALE = 1000` and stored as `uint16`, so they are effectively in millimeters | ### `CameraAdderConfig` `CameraAdderConfig` describes how a camera is added to the scene. | Key | Meaning | | --- | --- | | `xml_path` | Optional camera XML asset to insert directly | | `fovy` | Camera field of view, used when creating a camera directly | | `offset` | Camera pose offset | | `attachment_site` | Attachment site to use if mounted on a robot | | `robot_name` | If set, mount the camera on that robot; otherwise add it as a scene camera | The important frame detail is: - if `robot_name` is **not** set, `offset` is interpreted in the **root frame** and then moved into world by `root_frame_to_world` - if `robot_name` **is** set, `offset` is interpreted relative to that robot's **attachment site** ## Easy examples ### Minimal single-robot scene This is the basic shape used by `EmptyWorldFR3` in `python/rcs/envs/configs.py`: ```python cfg = SimEnvCreatorConfig( robot_cfgs={"robot": robot_cfg}, sim_cfg=SimConfig(async_control=False, realtime=True, frequency=1), control_mode=ControlMode.CARTESIAN_TQuat, scene=SCENE_PATHS["empty_world"], gripper_cfgs={"robot": gripper_cfg}, camera_cfgs={"bird_eye": bird_eye_cfg, "wrist": wrist_cfg}, robot_to_shared_base_frame={"robot": rcs.common.Pose()}, shared_base_frame_to_root_frame=rcs.common.Pose(), root_frame_to_world=rcs.common.Pose(), ) ``` What this means in plain language: - there is one robot named `robot` - it uses Cartesian `tquat` actions - the base scene is the empty world - there is one gripper on the robot - there are two cameras - all high-level frames start at the same origin ### Dual-arm scene This is the important part of the `EmptyWorldFR3Duo` example: ```python robot_cfgs = {"left": robot_cfg_left, "right": robot_cfg_right} robot_to_shared_base_frame = { "left": DEFAULT_TRANSFORMS["FR3_DUOMOUNT_LEFT_ROBOT"], "right": DEFAULT_TRANSFORMS["FR3_DUOMOUNT_RIGHT_ROBOT"], } shared_base_frame_to_root_frame = DEFAULT_TRANSFORMS["FR3_DUOMOUNT_HEIGHT_OFFSET"] root_frame_to_world = rcs.common.Pose() ``` In plain language: - the shared base frame sits at the logical center of the duo setup - the left and right robot bases are offset from that center - the whole setup can still be moved together by changing `root_frame_to_world` ### Object placement: which dictionary should I use? #### `world_frame_objects` Use this when the object should stay fixed in the room. ```python world_frame_objects = { "cube": (OBJECT_PATHS["green_cube"], rcs.common.Pose(translation=np.array([0.5, 0.0, 0.2]))), } ``` Example meaning: place a cube at a fixed world position. #### `root_frame_objects` Use this when the object belongs to the rig and should move together with it. ```python root_frame_objects = { "duo_mount": (OBJECT_PATHS["fr3_duo_mount"], DEFAULT_TRANSFORMS["FR3_DUOMOUNT_BASE"]), } ``` Example meaning: the duo mount is part of the setup, not a free world object. #### `robot_frame_objects` Use this when the object should be attached to one robot's tool frame. ```python robot_frame_objects = { "left": { "left_d405_mount": ( OBJECT_PATHS["robotiq_d405_mount"], DEFAULT_TRANSFORMS["FR3_ROBOTIQ_WRIST_D405_MOUNT"], ) } } ``` Example meaning: attach a wrist mount to the left robot only. ## Camera depth units When depth is enabled, the camera wrapper exposes depth images as scaled metric depth: - sim depth is first converted to **meters** in `python/rcs/camera/sim.py` - camera frames use `BaseCameraSet.DEPTH_SCALE = 1000` - depth is then stored as `uint16` So in practice: - divide by `1000` to get **meters** - or read the values directly as **millimeters** Example: - `depth[y, x] == 1500` means the point is about **1.5 m** away from the camera ### Camera placement #### Fixed scene camera This pattern from `EmptyWorldFR3` adds an overhead camera: ```python camera_adds = { "bird_eye": CameraAdderConfig( fovy=60.0, offset=rcs.common.Pose( translation=np.array([0.271, 0.0, 2.080]), quaternion=np.array([0.0060, -0.0060, -0.7067, 0.7074]), ), ) } ``` Because `robot_name` is not set, this pose is interpreted in the **root frame**. #### Wrist camera This pattern mounts a camera to a robot: ```python camera_adds = { "wrist": CameraAdderConfig( fovy=60.0, offset=some_pose, robot_name="robot", ) } ``` Because `robot_name` is set, `offset` is interpreted relative to that robot's **attachment site**. ## Common mistakes 1. **Mixing up world and root frame** - If the whole rig should move together, use `root_frame_to_world` or `root_frame_objects`, not `world_frame_objects`. 2. **Using the wrong frame for camera offsets** - Scene cameras use root-frame offsets. - Robot-mounted cameras use attachment-site offsets. 3. **Forgetting matching camera names** - If a camera is added without `xml_path`, its name must also exist in `camera_cfgs`. 4. **Putting wrist-mounted assets into world objects** - Use `robot_frame_objects` for things that should follow the robot wrist. 5. **Using `alternative_combined_robot_mjcf` without the expected prefixes** - The docstring in `scenes.py` requires names like `robot{robot_name}`. ## A practical workflow When building a new scene, this usually works well: 1. Start with one robot and identity transforms. 2. Add `root_frame_objects` for mounts or fixtures. 3. Add `world_frame_objects` only for room-fixed props. 4. Add `robot_frame_objects` for wrist payloads. 5. Add cameras with `camera_adds`. 6. Only then introduce non-trivial `shared_base_frame_to_root_frame` and `robot_to_shared_base_frame` offsets. That order keeps the frame reasoning much easier.