Reinforcement Learning

Year:
2nd year
Semester:
S1
Programme main editor:
(I2CAT)
Onsite in:
AU, UBB
Remote:
ECTS range:
5-7 ECTS

Professors

img
Professors
Francesco De Pellegrini
AU
img
Professors
Laura Dioşan
UBB

Prerequisites:

Students are required to have taken an introductory machine learning course.

Good knowledge on probability and statistics is expected.

Bases on Markov Chains are recommended, but this is not a prerequisite.

Pedagogical objectives:

This course provides an overview of reinforcement learning (RL) methods. Both theoretical and programming aspects will be extensively explored in this course in order to acquire a solid expertise on both. By the end of the course, students should:

  • Understand the notion of stochastic approximations and their relation with RL;
  • Understand the basis of Markov decision theory;
  • Apply Dynamic Programming methods to solve the Bellman equations;
  • Master the basic techniques of Reinforcement Learning: Monte Carlo, Time-difference and Policy Gradient;
  • Study a proof of convergence for RL algorithms;
  • Master more advanced techniques such as actor-critic methods and deep RL.

Evaluation modalities:

Final exam, lab and research project reports.

All students in the class will also conduct a research project in the field of reinforcement learning and write a short 5-page paper. Subjects will be provided during the first-class session, related to Constrained RL and Delayed RL.

Description:

This course will introduce machine learning techniques based on stochastic approximations and MDP models, i.e., SARSA, Q-learning, policy gradient. Two homework assignments will focus on implementing these techniques, in order to learn how to master them by direct implementation. A project in teams of 2/3 students will permit to address more advanced techniques and problems in the field of RL and more in general the application of Markov theory for modeling and optimization.

Lectures:

  • Course Overview. Introduction to Markov decision theory,  stochastic approximations, and reinforcement learning;
  • Stochastic approximations: the Robbins-Monro algorithm;
  • Criteria for convergence;
  • Application to admission control problems;
  • Markov decision processes: definitions, average cost and discounted cost;
  • Bellman equations. Solutions based on Dynamic Programming;
  • Monte Carlo methods for Reinforcement Learning;
  • Time Difference methods: SARSA and Q-Learning;
  • Proof of convergence of Q-Learning;
  • Policy gradient: REINFORCE;
  • Actor-critic methods;
  • Multi-armed bandits;
  • Deep-reinforcement Learning.

Lab assignments:

  • Practice of stochastic approximation on a traffics admission problem;
  • Practice of Montecarlo, Q-learning and SARSA on gridworld (discounted cost);
  • Practice of buffer management with admission control (average cost).

Required teaching material

Bibliography: • Artificial Intelligence: A modern approach, S. Russell and P. Norvig, Prentice Hall, 3rd edition, 2010. • Reinforcement Learning: An Introduction, R. S. Sutton and A. G. Barto, MIT Press, 1992

Teaching volume:
lessons:
28-42 hours
Exercices:
Supervised lab:
0-28 hours
Project:
0-3 hours

Devices:

  • Laboratory-Based Course Structure
  • Open-Source Software Requirements