Combinational Q-Learning for Dou Di Zhu

Published in AIIDE, 2020

In this paper, we study a special class of Asian popular card games called Dou Di Zhu, in which two adversarial groups of agents must consider numerous card combinations at each time step, leading to huge number of actions. We propose a novel method to handle combinatorial actions, which we call combinational Q-learning (CQL). We employ a two-stage network to reduce action space and also leverage order-invariant max-pooling operations to extract relationships between primitive actions.

Recommended citation: You, Y., Li, L., Guo, B., Wang, W., & Lu, C. (2019). Combinational Q-Learning for Dou Di Zhu. arXiv preprint arXiv:1901.08925.