A Compilation of Papers on Experimental Rigour in Machine Learning
Overview
In the following list, is a compilaton of papers on scientific methoodology and best practices in Machine Learning with a special focus on Reinforcement Learning sometimes. The intention is to create a strong starting point for folks who are interested in ensuring rigour in their experiments. The list was compiled with help from amazing folks in the RLAI lab.
The list
- Deep Reinforcement Learning that Matters
- Revisiting the Arcade Learning Environment: Evaluation Protocols and Open Problems for General Agents
- How Many Random Seeds? Statistical Power Analysis in Deep Reinforcement Learning Experiments
- Generalized Domains for Empirical Evaluations in Reinforcement Learning
- An empirical analysis of reinforcement learning using design of experiments
- Reproducibility of Benchmarked Deep Reinforcement Learning Tasks for Continuous Control
- Implementation Matters in Deep Policy Gradients: A Case Study on PPO and TRPO
- The Scientific Method in the Science of Machine Learning
- Deterministic Implementations for Reproducibility in Deep Reinforcement Learning
- Improving Reproducibility in Machine Learning Research
- Evaluating the Performance of Reinforcement Learning Algorithms
- Quantifying Generalization in Reinforcement Learning
- The Impact of Determinism on Learning Atari 2600 Games
- The Cross-environment Hyperparameter SettingBenchmark for Reinforcement Learning
- A Study on Overfitting in Deep Reinforcement Learning
Remarks
Lastly, if you have paper suggestions that we could add to this list, send me an email or open an issue in my website’s github repo.