Welcome EMS!
Lectures & Seminars Home - Lectures & Seminars - 正文
Luojia Economic and Management Development Forum No.106- Mathematical Economics and Mathematical Finance Forum - Singularity一Aware Reinforcement Learning
Date:2024-12-12

Topic: Singularity一Aware Reinforcement Learning  

Speaker: Chen Xiaohong, Professor of Economics at Yale University, Member of the American Academy of Arts and Sciences, and Member of the World Econometric Society.

Time: December 17, 2024, 08:40

Venue: EMS 378

Abstract:

Batch reinforcement learning (RL) aims at leveraging pre-collected data to find an optimal policy that maximizesthe expected total rewards in a dynamic environment.The existing methods require absolutely continuousassumption (e.g., there do not exist non-overlapping regions) on the distribution induced by target policies withrespect to the data distribution over either the state or action or both. We propose a new batch RL algorithm thatallows for singularity for both state and action spaces (e.g., existence of non-overlapping regions between oflinedata distribution and the distribution induced by the target policies) in the setting of an infinite-horizon Markovdecision process with continuous states and actions. We call our algorithm STEEL: SingulariTy-awarE rEinforcementLearning.Our algorithm is motivated by a new error analysis on off-policy evaluation, where we use maximum meandiscrepancy,together with distribution ally robust optimization, to characterize the error of off-policy evaluationcaused by the possible singularity and to enable model extrapolation. By leveraging the idea of pessimism and undersome technical conditions,we derive a first finite-sample regret guarantee for our proposed algorithm undersingularity.Compared with existing algorthms, by requiring only minimal data-coverage assumption,STEELimproves the applicability and robustness of batch RL. In addition, a two-step adaptive STEEL,which is nearlytuning-free,is proposed.Extensive simulation studies and one (semi)-real experiment on personlized pricingdemonstrate the superior performance of our methods in dealing with possible singularity in batch RL.

Guest Bio:

Chen Xiaohong is currently a Professor of Economics at Yale University. She graduated from the Department of Mathematics at Wuhan University in 1986 and received her Ph.D. in Economics from the University of California, San Diego, in 1993. In 2007, she was elected as a Fellow of the Econometric Society, and in 2019, she was elected as a Fellow of the American Academy of Arts and Sciences. She has won several prestigious awards, including the Multa Scripsit Award from the journal Econometric Theory, the Richard Sltone Award for Applied Econometrics, and the Arnold Zellner Award for Theoretical Econometrics. Due to her groundbreaking contributions to econometrics, in 2017, she was awarded the highest honor in Chinese economics, the China Economics Prize, by the Beijing Contemporary Economics Foundation. Professor Chen's research primarily focuses on methods for generalized method of moments (GMM) estimation, as well as estimation and inference for various semiparametric and nonparametric models. She has published dozens of highly influential papers in top-tier academic journals.