Off-Policy Evolutionary Reinforcement Learning with Maximum Mutations

$21^{\text{st}}$ International Conference on Autonomous Agents & Multi-Agent Systems (AAMAS 2022)

Oral Presentation

Preprint Blog Github Download .zip Download .tar.gz

Evolution-based Soft Actor Critic (ESAC) is an algorithm combining Evolution Strategies (ES) with Soft Actor-Critic (SAC) for performance equivalent to SAC and scalability comparable to ES. ESAC abstracts exploration from exploitation by exploring policies in weight space and optimizing returns in the value function space. The framework makes use of a novel soft winner selection function and carries out genetic crossovers in hindsight. ESAC also introduces the novel Automatic Mutation Tuning (AMT) which maximizes the mutation rate of ES in a small clipped region and provides significant hyperparameter robustness.

Concepts and applications of Reinforcement Learning (RL) have seen a tremendous growth over the past decade. These consist of applications in arcade games, board games and lately, robotic control tasks. Primary reason for this growth is the usage of computationally efficient function approximators such as neural networks. Modern-day RL algorithms make use of parallelization to reduce training times and boost agent’s performance through effective exploration giving rise to scalable methods, commonly referred to as Scalable Reinforcement Learning (SRL). However,a number of open problems such as approximation bias, lack of scalability in the case of long time horizons and lack of diverse exploration restrict the application of SRL to complex control and robotic tasks.

@misc{
karush17,
title={Off-Policy Evolutionary Reinforcement Learning with Maximum Mutations},
author={Karush Suri},
year={2021},
url={https://nbviewer.jupyter.org/github/karush17/karush17.github.io/blob/master/_pages/temp4.pdf}
}