報 告 人:劉衛(wèi)東 教授
報告題目:Online Estimation and Inference for Robust Policy Evaluation in Reinforcement Learning
報告時間:2023年10月14日(周六上午10:10 )
報告地點:江蘇師范大學數(shù)學與統(tǒng)計學院學術(shù)報告廳(靜遠樓1506室)
主辦單位:數(shù)學研究院、數(shù)學與統(tǒng)計學院、科學技術(shù)研究院
報告人簡介:
劉衛(wèi)東,上海交通大學特聘教授,國家杰出青年科學基金獲得者,中國工業(yè)與應用數(shù)學學會理事。主要研究方向為統(tǒng)計學和機器學習等,目前已在AOS、 JASA、JRSSB、Biometrika、JMLR、ICML、IJCAI、IEEE TSP等專業(yè)頂尖期刊/會議上發(fā)表論文六十余篇。主持國家重點研發(fā)計劃課題1項,國家杰出青年科學基金1項,國家優(yōu)秀青年科學基金1項。
報告摘要:
Recently, reinforcement learning has gained prominence in modern statistics, with policy evaluation being a key component. Unlike traditional machine learning literature on this topic, our work places emphasis on statistical inference for the parameter estimates computed using reinforcement learning algorithms. While most existing analyses assume random rewards to follow standard distributions, limiting their applicability, we embrace the concept of robust statistics in reinforcement learning by simultaneously addressing issues of outlier contamination and heavy-tailed rewards within a unified framework. In this paper, we develop an online robust policy evaluation procedure, and establish the limiting distribution of our estimator, based on its Bahadur representation. Furthermore, we develop a fully-online procedure to efficiently conduct statistical inference based on the asymptotic distribution. This paper bridges the gap between robust statistics and statistical inference in reinforcement learning, offering a more versatile and reliable approach to policy evaluation. Finally, we validate the efficacy of our algorithm through numerical experiments conducted in real-world reinforcement learning experiments.