A Compilation of Papers on Experimental Rigour in Machine Learning

Last updated on Aug 15, 2024 Scientific Methodology

Overview

In the following list, is a compilaton of papers on scientific methoodology and best practices in Machine Learning with a special focus on Reinforcement Learning sometimes. The intention is to create a strong starting point for folks who are interested in ensuring rigour in their experiments. The list was compiled with the help of amazing folks in Mila and in RLAI at UAlberta.

The list

Empirical Design in Reinforcement Learning
- If I were starting out in RL research or if I need to pick one paper, I’d pick this one!
Deep Reinforcement Learning that Matters
Revisiting the Arcade Learning Environment: Evaluation Protocols and Open Problems for General Agents
How Many Random Seeds? Statistical Power Analysis in Deep Reinforcement Learning Experiments
Generalized Domains for Empirical Evaluations in Reinforcement Learning
An empirical analysis of reinforcement learning using design of experiments
Reproducibility of Benchmarked Deep Reinforcement Learning Tasks for Continuous Control
Implementation Matters in Deep Policy Gradients: A Case Study on PPO and TRPO
The Scientific Method in the Science of Machine Learning
Deterministic Implementations for Reproducibility in Deep Reinforcement Learning
Improving Reproducibility in Machine Learning Research
Evaluating the Performance of Reinforcement Learning Algorithms
Quantifying Generalization in Reinforcement Learning
The Impact of Determinism on Learning Atari 2600 Games
The Cross-environment Hyperparameter SettingBenchmark for Reinforcement Learning
A Study on Overfitting in Deep Reinforcement Learning
Deep Reinforcement Learning at the Edge of the Statistical Precipice
On Bonus Based Exploration Methods In The Arcade Learning Environment
Rigorous Experimentation For Reinforcement Learning
AdaStop: adaptive statistical testing for sound comparisons of Deep RL agents
Revisiting Rainbow: Promoting more Insightful and Inclusive Deep Reinforcement Learning Research
On the consistency of hyper-parameter selection in value-based deep reinforcement learning

Remarks

Lastly, if you have paper suggestions that we could add to this list, send me an email or open an issue in my website’s github repo.

Academic

A Compilation of Papers on Experimental Rigour in Machine Learning

Overview

The list

Remarks

Esra'a Saleh