Safer Reinforcement Learning for Life-long Adaptation

Reinforcement learning, in particular, is based on the idea of learning through exploration, in other words: trial and error. However, trying out different options in an environment without any restrictions can be inherently risky. The agent might try behaviors that lead to catastrophic outcomes from which recovery or further learning is impossible. While this is not necessarily a problem in simulated environments, it becomes a more challenging issue if we would like these systems to someday work well in the real world. For example, a factory robot can not just randomly try out actions but has to make sure that the options tried do not pose any danger to humans working alongside such systems.

The goal of this project is to develop a system that provides agents with an “instinct”, developed through a much longer evolutionary timescale, that will guide agents away from unsafe exploration, facilitating life-long learning.

Research Output:
Grbic, Djordje, and Sebastian Risi. “Towards continual reinforcement learning through evolutionary meta-learning.” Proceedings of the Genetic and Evolutionary Computation Conference Companion. 2019.

Grbic, Djordje, and Sebastian Risi. “Safe Reinforcement Learning through Meta-learned Instincts.” arXiv preprint arXiv:2005.03233 (2020).