报告题目：A race-DC in Big Data
报告摘要: The strategy of divide-and-combine (DC) has been widely used in the area of big data. Bias-correction is crucial in the DC procedure for efficiently aggregating the locally biased estimators, especial for the case when the number of batches of data is large. This paper establishes a race-DC via residual-adjustment composition estimate (race). The race-DC applies to various types of biased estimators, which include but are not limited to Lasso estimator, Bridge estimator and principal component estimator in linear regression, and least squares estimator in nonlinear regression. The resulting global estimator is strictly unbiased under linear model, and is acceleratingly bias-reduced in nonlinear model, and can achieve the theoretical optimality, for the case when the number of batches of data is large. Moreover, the race-DC is computationally simple because it is a least squares estimator in a pro forma linear regression. Detailed simulation studies demonstrate that the resulting global is significantly bias-corrected, and the behavior is comparable with the oracle estimation and is much better than the competitors.
报告题目：Large-Scale Multi-Class M-Estimation and Related Local Sampling Strategies
报告摘要: In the situation of big data, the datasets available are so massive that computing statistics over the full sample is hardly feasible. A commonly used approach for solving this problem is to employ a subsampled dataset with the size much smaller than that of the full sample. Although various sampling strategies have been proposed in the existing literature, widely applicable methods for dealing with general large-scale and multi-class models have not yet been well investigated, to the best of our knowledge. In this paper, based on subsampled dataset, a corrected M-estimation objective function is introduced for general large-scale and multi-class models, and three local sampling strategies are proposed under information criterion, acceptability and robutstness. The resulting estimator has the standard asymptotic normality, and can be improved and optimized by the designed sampling strategies. The designed subsampling strategies have the local case-control framework and contain the information from not only the acceptance probability but also the model function. Moreover, subsampling schemes are computationally simple. The empirical performances of the proposed methods are compared to the competitors by both simulation studies and real-world datasets.
报告题目：Composition Estimation: An Asymptotically Weighted Least Squares Approach
报告摘要：The purpose of this paper is three-fold. First, based on the asymptotic presentation of initial estimators, and model-independent parameters either hidden in the model or combined with the initial estimators, a pro forma linear regression between the initial estimators and the parameters is defined in an asymptotic sense. Then a weighted least squares estimation is constructed within this framework. Second, systematic studies are conducted to examine when both variance and bias reductions can be achieved simultaneously and when only variance can be reduced. Third, a generic rule of constructing composite estimation and unified theoretical properties are introduced. Some important examples such as quantile regression, nonparametric kernel estimation, blockwise empirical likelihood estimation is investigated in detail to explain the methodology and theory. Simulations are conducted to examine its performance in finite sample situations and a real dataset is analyzed for illustration. The comparison with existing competitors is also made.
报告人概况：林路是山东大学金融研究院教授、博士生导师、副院长；在南开大学获得博士学位后，先在南开大学任教，然后到山东大学任教至今；从事高维统计、非参数和半参数统计以及金融统计等方的研究，在国际统计学、机器学习和相关应用学科顶级期刊Annals of Statistics, Journal of Machine Learning Research和其它重要期刊发表研究论文100余篇；主持过多项国家自然科学基金课题、博士点专项基金课题、山东省自然科学基金重点项目等；获得国家统计局颁发的统计科技进步一等和二等奖，山东省优秀教学成果一等奖；是国家973项目、国家创新群体和教育部创新团队的核心成员，教育部应用统计专业硕士教育引导委员会成员，山东省政府参事。