T. Taleb and A. Kunz, Machine type communications in 3GPP networks: potential, challenges, and solutions, IEEE Commun. Mag, vol.50, issue.3, pp.178-184, 2012.

A. Azari, P. Popovski, G. Miao, and C. Stefanovic, Grant-free radio access for short-packet communications over 5G networks, Proc. IEEE Global Commun. Conf. (GLOBECOM), pp.1-7, 2017.

S. K. Sharma and X. Wang, Collaborative distributed Q-learning for RACH congestion minimization in cellular IoT networks, IEEE Commun. Lett, vol.23, issue.4, pp.600-603, 2019.

O. Naparstek and K. Cohen, Deep multi-user reinforcement learning for distributed dynamic spectrum access, IEEE Trans. Wireless Commun, vol.18, issue.1, pp.310-323, 2019.

J. Zhang, X. Tao, H. Wu, N. Zhang, and X. Zhang, Deep reinforcement learning for throughput improvement of uplink grant-free NOMA system, IEEE Internet Things J, pp.1-11, 2020.

M. Bande and V. V. Veeravalli, Multi-user multi-armed bandits for uncoordinated spectrum access, 2018.

A. Magesh and V. V. Veeravalli, Multi-player multi-armed bandits with non-zero rewards on collisions for uncoordinated spectrum access, 2019.

I. Bistritz and A. Leshem, Distributed multi-player bandits -a game of thrones approach, 32nd Proc. Int. Conf. on Neural Inf. Process. Syst., ser. NIPS'18, pp.7222-7232, 2018.

T. Lattimore and C. Szepesvri, Bandit Algorithms, 2020.

H. Liu, B. Krishnamachari, and Q. Zhao, Cooperation and learning in multiuser opportunistic spectrum access, IEEE Int. Conf. on Commun. Workshops, pp.487-492, 2008.

K. Liu and Q. Zhao, Distributed learning in multi-armed bandit with multiple players, IEEE Trans. Signal Process, vol.58, issue.11, pp.5667-5681, 2010.

J. Rosenski, O. Shamir, and L. Szlak, Multi-player bandits-a musical chairs approach, Int. Conf. on Mach. Learn, pp.155-163, 2016.

E. Boursier, V. Perchet, E. Kaufmann, and A. Mehrabian, A practical algorithm for multiplayer bandits when arm means vary among players, 2019.
URL : https://hal.archives-ouvertes.fr/hal-02006069

J. R. Marden, H. P. Young, and L. Y. Pao, Achieving Pareto optimality through distributed learning, in SIAM J. Control Optim, issue.5, pp.2753-2770, 2014.

Y. Gai, B. Krishnamachari, and R. Jain, Combinatorial network optimization with unknown variables: Multi-armed bandits with linear rewards and individual observations, IEEE/ACM Trans. Netw, vol.20, issue.5, pp.1466-1478, 2012.

Y. Xia, T. Qin, W. Ma, N. Yu, and T. Liu, Budgeted multi-armed bandits with multiple plays, 25th Proc. Int. Joint Conf. on Artif. Intell, pp.2210-2216, 2016.

W. Hoeffding, Probability inequalities for sums of bounded random variables, J. Amer. Statist. Assoc, vol.58, issue.301, pp.13-30, 1963.

H. P. Young, The evolution of conventions, Econometrica, vol.61, issue.1, pp.57-84, 1993.

K. Chung, H. Lam, Z. Liu, and M. Mitzenmacher, Chernoff-Hoeffding bounds for Markov chains: Generalized and simplified, 29th Symp. Theor, pp.124-135, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00678208