Check the preview of 2nd version of this platform being developed by the open MLCommons taskforce on automation and reproducibility as a free, open-source and technology-agnostic on-prem platform.

Corruption Robust Exploration in Episodic Reinforcement Learning

lib:be8e04c29964f3c6 (v1.0.0)

Authors: Thodoris Lykouris,Max Simchowitz,Aleksandrs Slivkins,Wen Sun
ArXiv: 1911.08689
Document:  PDF  DOI 
Abstract URL: https://arxiv.org/abs/1911.08689v1


We initiate the study of multi-stage episodic reinforcement learning under adversarial manipulations in both the rewards and the transition probabilities of the underlying system. Existing efficient algorithms heavily rely on the "optimism under uncertainty" principle which dictates their behavior and does not allow flexibility to perform corruption-robust exploration. We address this by (i) departing from the optimistic behavior, and (ii) creating a general framework that incorporates the principle of action-elimination. (This principle has been essential for corruption-robust exploration in multi-armed bandits, a degenerate special case of episodic reinforcement learning.) Despite constructing a lower bound for a straightforward implementation of action-elimination, we provide a clean and modular way to transfer it to episodic reinforcement learning. Our algorithm enjoys near-optimal guarantees in the absence of adversarial manipulations, has performance that degrades gracefully as the amount of corruption increases, and does not need to know this amount. Our results shed new light on the broader question of robust exploration, and suggest a way to address a rather daunting mismatch between optimistic algorithms and algorithms with higher flexibility. To demonstrate the applicability of our framework, we provide a second instantiation thereof, showing how it can provide efficient guarantees for the stochastic setting, despite doing almost uniform exploration across plausibly optimal actions.

Relevant initiatives  

Related knowledge about this paper Reproduced results (crowd-benchmarking and competitions) Artifact and reproducibility checklists Common formats for research projects and shared artifacts Reproducibility initiatives

Comments  

Please log in to add your comments!
If you notice any inapropriate content that should not be here, please report us as soon as possible and we will try to remove it within 48 hours!