Pemanfaatan Asynchronous Advantage Actor-Critic Dalam Pembuatan AI Game Bot Pada Game Arcade

  • Evan Kusuma Susanto ISTTS
  • Yosi Kristian
Keywords: Artificial Intelligence, Deep Reinforcement Learning, Machine Learning, Reinforcement Learning

Abstract

Asynchronous Advantage Actor-Critic (A3C) adalah sebuah algoritma deep reinforcement learning yang dikembangkan oleh Google DeepMind. Algoritma ini dapat digunakan untuk menciptakan sebuah arsitektur artificial intelligence yang dapat menguasai berbagai jenis game yang berbeda melalui trial and error dengan mempelajari tempilan layar game dan skor yang diperoleh dari hasil tindakannya tanpa campur tangan manusia. Sebuah network A3C terdiri dari Convolutional Neural Network (CNN) di bagian depan, Long Short-Term Memory Network (LSTM) di tengah, dan sebuah Actor-Critic network di bagian belakang. CNN berguna sebagai perangkum dari citra output layar dengan mengekstrak fitur-fitur yang penting yang terdapat pada layar. LSTM berguna sebagai pengingat keadaan game sebelumnya. Actor-Critic Network berguna untuk menentukan tindakan terbaik untuk dilakukan ketika dihadapkan dengan suatu kondisi tertentu. Dari hasil percobaan yang dilakukan, metode ini cukup efektif dan dapat mengalahkan pemain pemula dalam memainkan 5 game yang digunakan sebagai bahan uji coba.

References

[1] G. Tesauro, “Temporal difference learning and TD-Gammon,” Commun. ACM, 1995, doi: 10.1145/203330.203343.
[2] M. Campbell, a. J. Hoane Jr., and F. Hsu, “Deep Blue,” Artif. Intell., vol. 134, no. 1–2, pp. 57–83, 2002, doi: 10.1016/S0004-3702(01)00129-1.
[3] V. Mnih et al., “Playing atari with deep reinforcement learning,” arXiv Prepr. arXiv1312.5602, 2013.
[4] D. Silver et al., “Mastering the game of Go without human knowledge,” Nature, 2017, doi: 10.1038/nature24270.
[5] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet Classification with Deep Convolutional Neural Networks,” Adv. Neural Inf. Process. Syst., 2012, doi: http://dx.doi.org/10.1016/j.protcy.2014.09.007.
[6] R. S. Sutton and A. G. Barto, Introduction to reinforcement learning, vol. 135. MIT press Cambridge, 1998.
[7] V. R. Konda and J. N. Tsitsiklis, “Actor-critic algorithms,” in Advances in neural information processing systems, 2000, pp. 1008–1014.
[8] R. Bellman, “A Markovian decision process,” J. Math. Mech., pp. 679–684, 1957.
[9] G. Brockman et al., “Evolutionary algorithms for reinforcement learning,” J. Artif. Intell. Res., vol. 47, pp. 253–279, Jun. 2013.
[10] R. S. Sutton, D. Mcallester, S. Singh, and Y. Mansour, “Policy Gradient Methods for Reinforcement Learning with Function Approximation,” Adv. Neural Inf. Process. Syst. 12, 1999, doi: 10.1.1.37.9714.
[11] R. J. Willia, “Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning,” Mach. Learn., 1992, doi: 10.1023/A:1022672621406.
[12] N. Kohl and P. Stone, “Policy gradient reinforcement learning for fast quadrupedal locomotion,” in IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA ’04. 2004, 2004, doi: 10.1109/ROBOT.2004.1307456.
[13] R. S. Sutton, “Learning to Predict by the Methods of Temporal Differences,” Mach. Learn., 1988, doi: 10.1023/A:1022633531479.
[14] C. J. C. H. Watkins and P. Dayan, “Q-learning,” Mach. Learn., 1992, doi: 10.1007/BF00992698.
[15] G. Brockman et al., “High-dimensional continuous control using generalized advantage estimation,” Adv. Neural Inf. Process. Syst., vol. 47, no. 7540, pp. 693–701, Jun. 2015.
[16] V. Sharma, S. Rai, and A. Dev, “A Comprehensive Study of Artificial Neural Networks,” Int. J. Adv. Res. Comput. Sci. Softw. Eng., 2012.
[17] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proc. IEEE, 1998, doi: 10.1109/5.726791.
[18] K. O’Shea and R. Nash, “An Introduction to Convolutional Neural Networks,” CoRR, vol. abs/1511.0, 2015.
[19] T. Mikolov, M. Karafiat, L. Burget, J. Cernocky, and S. Khudanpur, “Recurrent Neural Network based Language Model,” Interspeech, 2010.
[20] S. Hochreiter, “The Vanishing Gradient Problem During Learning Recurrent Neural Nets and Problem Solutions,” Int. J. Uncertainty, Fuzziness Knowledge-Based Syst., 1998, doi: 10.1142/S0218488598000094.
[21] S. Hochreiter and J. Urgen Schmidhuber, “LONG SHORT-TERM MEMORY,” Neural Comput., 1997, doi: 10.1162/neco.1997.9.8.1735.
[22] V. Mnih et al., “Human-level control through deep reinforcement learning,” Nature, vol. 518, no. 7540, p. 529, 2015.
[23] L. Lin, “Reinforcement Learning for Robots Using Neural Networks,” Report, C., 1993.
[24] V. Mnih et al., “Prioritized Experience Replay,” Int. Conf. Mach. Learn., 2015, doi: 10.1038/nature14236.
[25] Z. Wang, N. de Freitas, and M. Lanctot, “Dueling Network Architectures for Deep Reinforcement Learning,” arXiv Prepr. arXiv1511.06581, 2015.
[26] A. Nair et al., “Massively Parallel Methods for Deep Reinforcement Learning,” arXiv:1507.04296, 2015, doi: 10.1109/IJCNN.2010.5596468.
[27] T. P. Lillicrap et al., “Continuous control with deep reinforcement learning,” arXiv Prepr. arXiv1509.02971, 2015.
[28] V. Mnih et al., “Asynchronous methods for deep reinforcement learning,” in International conference on machine learning, 2016, pp. 1928–1937.
[29] B. Recht, C. Re, S. Wright, and F. Niu, “Hogwild: A lock-free approach to parallelizing stochastic gradient descent,” in Advances in neural information processing systems, 2011, pp. 693–701.
[30] DeepMind, “Asynchronous Methods for Deep Reinforcement Learning: MuJoCo - YouTube.” [Online]. Available: https://www.youtube.com/watch?v=Ajjc08-iPx8&feature=youtu.be. [Accessed: 01-May-2018].
[31] DeepMind, “Asynchronous Methods for Deep Reinforcement Learning: Labyrinth - YouTube.” [Online]. Available: https://www.youtube.com/watch?v=nMR5mjCFZCw&feature=youtu.be. [Accessed: 01-May-2018].
[32] G. Brockman et al., “OpenAI Gym.” 2016.
Published
2020-07-15