Retargeting User Guide

API Reference

Basic Usage

from wuji_retargeting import Retargeter

# Create retargeter from configuration file
retargeter = Retargeter.from_yaml("config/adaptive_analytical_avp.yaml", hand_side="right")

# Retarget: convert 21 keypoints to 20 joint angles
qpos = retargeter.retarget(raw_keypoints)  # (21, 3) -> (20,)

Input Format

Input data is MediaPipe format hand keypoints with shape (21, 3), containing 3D coordinates for 21 keypoints (unit: meters). Regardless of whether the source is Vision Pro, replay data, MP4 video, or Intel RealSense, the input pipeline ultimately normalizes the data into this format.

Unit Conversion: Input keypoints use meters (MediaPipe standard), but are automatically converted to centimeters internally (M_TO_CM = 100.0). Therefore, all distance-related parameters in configuration files (e.g., huber_delta, pinch_thresholds.d1/d2) should be specified in centimeters.

Keypoint index mapping:

IndexKeypoint
0Wrist
1-4Thumb (CMC → TIP)
5-8Index finger (MCP → TIP)
9-12Middle finger (MCP → TIP)
13-16Ring finger (MCP → TIP)
17-20Pinky (MCP → TIP)

Output Format

Output is a length-20 joint angle array (radians), corresponding to Wuji Hand's 20 joints:

IndexJoint
0-3Thumb joint1-4
4-7Index finger joint1-4
8-11Middle finger joint1-4
12-15Ring finger joint1-4
16-19Pinky joint1-4

Configuration File

optimizer:
  type: "AdaptiveOptimizerAnalytical"

retarget:
  # Huber loss thresholds
  huber_delta: 2.0             # Position Huber threshold (cm)
  huber_delta_dir: 0.5         # Direction Huber threshold

  # Loss weights
  w_pos: 1.0           # Fingertip position weight
  w_dir: 10.0          # Fingertip direction weight
  w_full_hand: 1.0     # Full hand weight

  # Regularization
  norm_delta: 0.04     # Velocity regularization weight

  # Scaling
  scaling: 1.0         # Global scaling

  # Per-finger segment scaling [PIP, DIP, TIP]
  segment_scaling:
    thumb:  [1.0, 1.0, 1.0]
    index:  [1.0, 1.03, 1.05]
    middle: [1.0, 1.0, 1.0]
    ring:   [1.0, 1.0, 1.0]
    pinky:  [1.05, 1.15, 1.15]

  # Pinch thresholds (cm)
  pinch_thresholds:
    index:  { d1: 2.0, d2: 4.0 }
    middle: { d1: 2.0, d2: 4.0 }
    ring:   { d1: 2.0, d2: 4.0 }
    pinky:  { d1: 2.0, d2: 4.0 }

  # Low-pass filter (0~1, lower is smoother)
  lp_alpha: 0.2

The example above reflects the default Vision Pro configuration style. For MediaPipe vision sources such as video and RealSense, use the dedicated adaptive_analytical_video.yaml configuration.

Configuration by Input Mode

Configuration fileUse caseNotes
config/adaptive_analytical_avp.yamlApple Vision ProDefault configuration tuned for stereo Vision Pro tracking data
config/adaptive_analytical_video.yamlMP4 video / Intel RealSenseTuned for monocular or RGB vision input in the current RGB/MediaPipe pipeline, with noisier landmarks and weaker depth cues
config/adaptive_analytical_wuji_glove_{left,right}.yamlWuji Glove → Wuji HandWuji Glove input driving the default Wuji Hand
config/adaptive_analytical_wuji_glove_wh120_{left,right}.yamlWuji Glove → Wuji Hand 2Wuji Glove input driving Wuji Hand 2. Points the optimizer at the Wuji Hand 2 model via optimizer.urdf_path / mjcf_path / link_naming

Driving a Custom Dexterous Hand

Three keys under the optimizer block point the same retargeting algorithm at different hand models without code changes:

FieldPurpose
urdf_pathURDF used by IK (Pinocchio, meshes not loaded)
mjcf_pathMJCF used by simulation (MuJoCo, meshes loaded)
link_namingMaps the algorithm's logical roles (palm, per-finger PIP / DIP / TIP / MCP-plane reference) onto the URDF's actual link names

link_naming supports a prefix (for example l_ for the Wuji Hand 2 left hand) and the {finger} placeholder. Wuji Hand 2 uses anatomical names (r_wrist, r_index_finger_pip, ...), configured as:

optimizer:
  link_naming:
    prefix: "r_"
    palm:    "wrist"
    fingers: [thumb, index_finger, middle_finger, ring_finger, pinky]
    pip:     "{finger}_middle"
    dip:     "{finger}_distal"
    tip:     "{finger}_tip"
    link1:   "{finger}_proximal_abd"

When the URDF and MJCF declare joints in different orders (for example Wuji Hand 2 puts index first in the URDF), output qpos is automatically reordered by joint name to the MJCF or device order, so finger angles do not land on the wrong joint. If optimizer.mjcf_path is set but joint names cannot be aligned, the sim and real-hardware entry points fail loudly at startup instead of silently driving the wrong joints.

For video and realsense modes, the example scripts automatically switch to config/adaptive_analytical_video.yaml. This configuration typically:

  • uses stronger smoothing and regularization to reduce jitter from vision input
  • adjusts segment_scaling to compensate for finger-length distortion in monocular landmarks
  • widens pinch_thresholds for better stability in noisy conditions
  • adds mediapipe_rotation to compensate for systematic orientation offsets

video_input Parameters

adaptive_analytical_video.yaml also includes a video_input section for preprocessing video and RealSense input:

ParameterDescription
z_scaleAmplifies MediaPipe monocular depth variation to counter depth compression
correct_segmentsWhether to correct finger segment lengths using reference hand proportions
reference_wrist_to_mid_mcpReference length used for global scaling (wrist to middle MCP distance)

Parameter Descriptions

ParameterDefaultDescription
huber_delta2.0Position Huber threshold (cm)
huber_delta_dir0.5Direction Huber threshold
w_pos1.0Fingertip position loss weight
w_dir10.0Fingertip direction loss weight
w_full_hand1.0Full hand loss weight
norm_delta0.04Velocity regularization weight
scaling1.0Global scaling factor
segment_scaling-Per-finger segment scaling
pinch_thresholds-Pinch distance thresholds (cm)
lp_alpha0.2Low-pass filter coefficient

Default configuration is tuned for Apple Vision Pro. For MP4 video and Intel RealSense, start with adaptive_analytical_video.yaml, then adjust scaling, segment_scaling, and video_input parameters based on hand size and image conditions.

Hand Size Adaptation

If retargeting results are poor, adjust the following parameters:

Global Scaling: Adjust scaling parameter to match palm size

scaling: 1.1  # Increase for larger hands
scaling: 0.9  # Decrease for smaller hands

Finger Length: Adjust each finger segment's length ratio via segment_scaling

segment_scaling:
  index: [1.0, 1.05, 1.1]  # Index finger segment scaling: [PIP, DIP, TIP]