Follow
Thomas William Anthony
Thomas William Anthony
Google DeepMind
Verified email at google.com
Title
Cited by
Cited by
Year
Thinking fast and slow with deep learning and tree search
TW Anthony, Z Tian, D Barber
Advances in Neural Information Processing Systems, 5360-5370, 2017
4012017
Openspiel: A framework for reinforcement learning in games
M Lanctot, E Lockhart, JB Lespiau, V Zambaldi, S Upadhyay, J Pérolat, ...
arXiv preprint arXiv:1908.09453, 2019
2572019
Mastering the game of Stratego with model-free multiagent reinforcement learning
J Perolat, B De Vylder, D Hennes, E Tarassov, F Strub, V de Boer, ...
Science 378 (6623), 990-996, 2022
1712022
From Poincaré recurrence to convergence in imperfect information games: Finding equilibrium via regularization
J Perolat, R Munos, JB Lespiau, S Omidshafiei, M Rowland, P Ortega, ...
International Conference on Machine Learning, 8525-8535, 2021
822021
On the role of planning in model-based deep reinforcement learning
JB Hamrick, AL Friesen, F Behbahani, A Guez, F Viola, S Witherspoon, ...
arXiv preprint arXiv:2011.04021, 2020
742020
Learning to Play No-Press Diplomacy with Best Response Policy Iteration
T Anthony, T Eccles, A Tacchetti, J Kramár, I Gemp, TC Hudson, N Porcel, ...
arXiv preprint arXiv:2006.04635, 2020
512020
Policy Gradient Search: Online Planning and Expert Iteration without Search Trees
TW Anthony, R Nishihara, P Moritz, T Salimans, J Schulman
arXiv preprint arXiv:1904.03646, 2019
312019
OpenSpiel: A Framework for Reinforcement Learning in Games. CoRR abs/1908.09453 (2019)
M Lanctot, E Lockhart, JB Lespiau, V Zambaldi, S Upadhyay, J Pérolat, ...
arXiv preprint cs.LG/1908.09453, 2019
242019
Learning to Resolve Alliance Dilemmas in Many-Player Zero-Sum Games
E Hughes, TW Anthony, T Eccles, JZ Leibo, D Balduzzi, Y Bachrach
arXiv preprint arXiv:2003.00799, 2020
232020
ITERATIVE EMPIRICAL GAME SOLVING VIA SINGLE POLICY BEST RESPONSE
MO Smith, T Anthony, MP Wellman
19*
Sample-based Approximation of Nash in Large Many-Player Games via Gradient Descent
I Gemp, R Savani, M Lanctot, Y Bachrach, T Anthony, R Everett, ...
arXiv preprint arXiv:2106.01285, 2021
182021
Smooth markets: A basic mechanism for organizing gradient-based learners
D Balduzzi, WM Czarnecki, TW Anthony, IM Gemp, E Hughes, JZ Leibo, ...
arXiv preprint arXiv:2001.04678, 2020
172020
Learning to play against any mixture of opponents
MO Smith, T Anthony, MP Wellman
Frontiers in Artificial Intelligence 6, 2023
142023
Turbocharging solution concepts: Solving NEs, CEs and CCEs with neural equilibrium solvers
L Marris, I Gemp, T Anthony, A Tacchetti, S Liu, K Tuyls
Advances in Neural Information Processing Systems 35, 5586-5600, 2022
132022
Expert iteration
TW Anthony
UCL (University College London), 2021
72021
Heterogeneous Social Value Orientation Leads to Meaningful Diversity in Sequential Social Dilemmas
U Madhushani, KR McKee, JP Agapiou, JZ Leibo, R Everett, T Anthony, ...
arXiv preprint arXiv:2305.00768, 2023
42023
Population-based Evaluation in Repeated Rock-Paper-Scissors as a Benchmark for Multiagent Reinforcement Learning
M Lanctot, J Schultz, N Burch, MO Smith, D Hennes, T Anthony, J Perolat
arXiv preprint arXiv:2303.03196, 2023
42023
Designing all-pay auctions using deep learning and multi-agent simulation
I Gemp, T Anthony, J Kramar, T Eccles, A Tacchetti, Y Bachrach
Scientific Reports 12 (1), 16937, 2022
42022
Developing, evaluating and scaling learning agents in multi-agent environments
I Gemp, T Anthony, Y Bachrach, A Bhoopchand, K Bullard, J Connor, ...
AI Communications 35 (4), 271-284, 2022
42022
Strategic Knowledge Transfer
MO Smith, T Anthony, MP Wellman
Journal of Machine Learning Research 24 (233), 1-96, 2023
32023
The system can't perform the operation now. Try again later.
Articles 1–20