Management Science Seminar
Event Date: 23 October 2015
Time 1.30pm
Location JA504
Speaker Junchi Tan A reinforcement learning approach to the maintenance planning problem of a large scale power plant. Motivated by the maintenance planning problem from the Longannet Power station of Scottish Power, this PhD project considers the maintenance policy optimisation problem of a large scale power plant. Markov decision processes (MDPs) provide a general framework for such decision making problems in which the system state evolution process is partially random and partially influenced by imposed actions. But these problems are usually in immense sizes, and the traditional dynamic programming (DP) algorithms fall apart because: (1) They require the system state transition probabilities, which are often hard to attain with a large state space especially when the transition mechanism is complex (curse of modelling); (2) it is also impractical to store these probabilities when states are beyond one million (curse of dimensionality); (3) DP algorithms would take impractically long time to solve industrial size problems, because of their brutal-force nature. A simulation-based approach known as Reinforcement Learning (RL), however, can quickly deliver near-optimal solutions. Q-learning is such an algorithm and it is based on value iteration and the Robbins-Monro stochastic approximation scheme. The main idea of Q-learning is as follows: One learns the optimal actions by mindfully trying actions and updating knowledge, while rambling through the state-space. Q-learning only requests to store compact-size simulators to generate trajectories. Furthermore, Q-learning can utilize various function approximation methods (e.g. regression, neural networks) to further scale up in case the state-space becomes gigantic.
Published: 23 October 2015