Deep reinforcement learning (DRL) has achieved remarkable success in robot control. However, DRL with tactile feedback still faces challenges in contact-rich tasks involving visual occlusion or high-speed dynamics. The challenges stem from the complexity of real-world tactile sensors and the computational intensity of high-fidelity simulators.
To address this, we design a high-speed tactile simulation model, enabling efficient, large-scale DRL training on GPUs. We then propose the Contrastive Tactile (ConTact) framework, which leverages contrastive learning to align tactile features for sim-to-real transfer. ConTact employs a dedicated spatiotemporal encoder that explicitly models temporal changes to capture the dynamic features of contact events.
We validate it on two manipulation tasks, Single and Composite Object Tracking (SOT/COT), which rely solely on tactile information. Policies trained with ConTact from simulation are directly deployed in the real world without finetuning, achieving zero-shot transfer.
The ConTact framework addresses the sim-to-real challenge by aligning tactile data features. We first design a computationally efficient ray-casting tactile simulation model in MuJoCo MJX. Based on this, we propose a pre-training Contrastive Tactile (ConTact) framework.
As depicted above, ConTact leverages contrastive learning (using a symmetric contrastive loss \(\mathcal{L}_{CTA}\)) and a spatio-temporal encoder to align tactile features across simulated and real domains. This allows us to extract unified representations for downstream DRL tasks, eliminating the need for real-world fine-tuning.
Real-world vs Simulated Tactile Array
We introduce two kinds of manipulation tasks to evaluate our framework:
For both tasks, the policy relies solely on tactile features and proprioception, without any visual input.
Real-world Input vs Reconstructed Signal
t-SNE visualization of aligned latent features
We evaluate the capability to stabilize objects during dynamic movements across four distinct reference trajectories: linear, circular, lemniscate, and ascending helical. The policy demonstrates robustness against dynamic challenges, including real-time object switching, varied initial positions, and external perturbations (e.g., being tapped by a hammer).
Our force-based tactile model significantly outperforms binary-signal baselines. As shown in our ablation studies, binary signals are ambiguous and unable to differentiate unstable states in the Composite Object Tracking task.
@article{contact2025,
title={ConTact: Contrastive Tactile Alignment for Sim-to-Real Robotic Manipulation},
author={Anonymous Authors},
journal={Submission},
year={2025}
}