We present a novel method for
achieving dexterous manipulation of complex objects, while
simultaneously securing the object without the use of passive
support surfaces. We posit that a key difficulty for training such
policies in a Reinforcement Learning framework is the difficulty
of exploring the problem state space, as the accessible regions
of this space form a complex structure along manifolds of a
high-dimensional space. To address this challenge, we use two
versions of the non-holonomic Rapidly-Exploring Random Trees
algorithm; one version is more general, but requires explicit
use of the environment’s transition function, while the second
version uses manipulation-specific kinematic constraints to attain
better sample efficiency. In both cases, we use states found via
sampling-based exploration to generate reset distributions that
enable training control policies under full dynamic constraints via
model-free Reinforcement Learning. We show that these policies
are effective at manipulation problems of higher difficulty than
previously shown, and also transfer effectively to real robots.
For more details of our method please view the preprint on arXiv. More supporting visualizations can be found in the video below.