A Compilation of Papers on Experimental Rigour in Machine Learning
Overview
In the following list, is a compilaton of papers on scientific methoodology and best practices in Machine Learning with a special focus on Reinforcement Learning sometimes. The intention is to create a strong starting point for folks who are interested in ensuring rigour in their experiments. The list was compiled with the help of amazing folks in Mila and in RLAI at UAlberta.
The list
- Empirical Design in Reinforcement Learning
- If I were starting out in RL research or if I need to pick one paper, I’d pick this one!
- Deep Reinforcement Learning that Matters
- Revisiting the Arcade Learning Environment: Evaluation Protocols and Open Problems for General Agents
- How Many Random Seeds? Statistical Power Analysis in Deep Reinforcement Learning Experiments
- Generalized Domains for Empirical Evaluations in Reinforcement Learning
- An empirical analysis of reinforcement learning using design of experiments
- Reproducibility of Benchmarked Deep Reinforcement Learning Tasks for Continuous Control
- Implementation Matters in Deep Policy Gradients: A Case Study on PPO and TRPO
- The Scientific Method in the Science of Machine Learning
- Deterministic Implementations for Reproducibility in Deep Reinforcement Learning
- Improving Reproducibility in Machine Learning Research
- Evaluating the Performance of Reinforcement Learning Algorithms
- Quantifying Generalization in Reinforcement Learning
- The Impact of Determinism on Learning Atari 2600 Games
- The Cross-environment Hyperparameter SettingBenchmark for Reinforcement Learning
- A Study on Overfitting in Deep Reinforcement Learning
- Deep Reinforcement Learning at the Edge of the Statistical Precipice
- On Bonus Based Exploration Methods In The Arcade Learning Environment
- Rigorous Experimentation For Reinforcement Learning
- AdaStop: adaptive statistical testing for sound comparisons of Deep RL agents
- Revisiting Rainbow: Promoting more Insightful and Inclusive Deep Reinforcement Learning Research
- On the consistency of hyper-parameter selection in value-based deep reinforcement learning
Remarks
Lastly, if you have paper suggestions that we could add to this list, send me an email or open an issue in my website’s github repo.