Weiran Huang (Chinese: 黄维然) is currently a senior researcher at the AI Theory Group of Noah's Ark Lab. He received his PhD degree in computer science from the Institute for Interdisciplinary Information Sciences (IIIS), Tsinghua University, under the supervision of Prof. Andrew C. Yao and Prof. Wei Chen. He was a visiting scholar at Harvard University hosted by Prof. Yaron Singer, and also served as a research intern at Microsoft Research Asia (MSRA). Before that, he got his BE degree from the Department of Electronic Engineering, Tsinghua University. In 2007, he won a gold medal in the 24th Chinese Physics Olympiad (CPhO), and was chosen to the National Training Team (国家集训队). His work has been published on top conferences such as NeurIPS, ICCV, KDD, AAAI, IJCAI, etc. He also served as a PC or reviewer for NeurIPS, COLT, IJCAI, ASONAM, WINE, JASIST, etc.


  • Outstanding Graduate Student, 2018. [Media Coverage]
  • Excellence in the Microsoft Research Asia Internship Program, 2018. [Media Coverage]
  • Excellent Student Leader of Tsinghua, President of IIIS Student Union, 2013. [Media Coverage]
  • Tsinghua-Baidu Scholarship, Tsinghua Social Work Scholarship, Tsinghua Freshman Scholarship, 2008 - 2013.
  • Top 2 in Tsinghua Mathematical Contest in Modeling, 2012.
  • Excellent work in the Tencent Internet Development Competition, 2012.
  • Invited to the 11th Wu Chien-Shiung Science Camp, and won Osheroff Prize, 2008. [Media Coverage]
  • 1st Prize (gold medal) in the 24th Chinese Physics Olympiad (CPhO), and chosen to the National Training Team (31 people in China), 2007. [Media Coverage]

Research Interests

  • Machine Learning Theory: Learning theory aims to understand the fundamental principles of machine learning, e.g., network capacity, optimization method and generalization ability, and provides insights for algorithms. Theoretical research on machine learning is very important, since it can prevent machine learning from becoming alchemy and guide the design of new algorithms and frameworks.
  • Few-Shot Learning/Federated Learning: Deep learning has achieved great success in various tasks, however, it requires large amounts of labeled data for model training. This severely limits its applications – in many scenarios, collecting a large number of labeled samples is costly, infeasible or even impossible, e.g., medical data and mobile user’s data. To overcome such limitation, how to use only a few labeled samples to learn a model by transferring generic knowledge or information from other domains or other devices becomes more and more important.
P.S. We are constantly looking for self-motivated research interns. Please send me your CV if you are interested. (See Details)


  • Professional Services:
    • Conference Reviewer: IJCAI 2020 (PC), ASONAM 2020 (PC), NeurIPS 2019, ASONAM 2019 (PC), COLT 2019, WINE 2017, NIPS 2016.
    • Journal Reviewer: JASIST.
  • Interns and Students:
    • Current: Peibin Chen (PKU), Aoxue Li (PKU).
    • Past: Yihong Chen (UCL), Shifeng Zhang (Tsinghua), Yue Liu (PKU), Xingqiu He (UESTC), Yimin Huang (PKU), Junyang Li (PKU).


  1. Yimin Huang, Weiran Huang, Liang Li, Zhenguo Li, "Meta-Learning PAC-Bayes Priors in Model Averaging", AAAI 2020.
    [Paper, Full Version]
    Nowadays model uncertainty has become one of the most important problems in both academia and industry. In this paper, we mainly consider the scenario in which we have a common model set used for model averaging instead of selecting a single final model via a model selection procedure to account for this model's uncertainty to improve reliability and accuracy of inferences. Here one main challenge is to learn the prior over the model set. To tackle this problem, we propose two data-based algorithms to get proper priors for model averaging. One is for meta-learner, the analysts should use historical similar tasks to extract the information about the prior. The other one is for base-learner, a subsampling method is used to deal with the data step by step. Theoretically, an upper bound of risk for our algorithm is presented to guarantee the performance of the worst situation. In practice, both methods perform well in simulations and real data studies, especially with poor quality data.
  2. Jiacheng Sun, Xiangyong Cao, Hanwen Liang, Weiran Huang, Zewei Chen, Zhenguo Li, "Some New Interpretations of Normalization Methods", AAAI 2020.
    [To Appear]
  3. Hanwen Liang*, Shifeng Zhang*, Jiacheng Sun, Xingqiu He, Weiran Huang, Kechen Zhuang, Zhenguo Li, "DARTS+: Improved Differentiable Architecture Search with Early Stopping", arXiv preprint arXiv:1909.06035 (2019).
    [Paper, Media Coverage]
    Recently, there has been a growing interest in automating the process of neural architecture design, and the Differentiable Architecture Search (DARTS) method makes the process available within a few GPU days. In particular, a hyper-network called one-shot model is introduced, over which the architecture can be searched continuously with gradient descent. However, the performance of DARTS is often observed to collapse when the number of search epochs becomes large. Meanwhile, lots of "skip-connects" are found in the selected architectures. In this paper, we claim that the cause of the collapse is that there exist cooperation and competition in the bi-level optimization in DARTS, where the architecture parameters and model weights are updated alternatively. Therefore, we propose a simple and effective algorithm, named "DARTS+", to avoid the collapse and improve the original DARTS, by "early stopping" the search procedure when meeting a certain criterion. We demonstrate that the proposed early stopping criterion is effective in avoiding the collapse issue. We also conduct experiments on benchmark datasets and show the effectiveness of our DARTS+ algorithm, where DARTS+ achieves 2.32% test error on CIFAR10, 14.87% on CIFAR100, and 23.7% on ImageNet. We further remark that the idea of "early stopping" is implicitly included in some existing DARTS variants by manually setting a small number of search epochs, while we give an explicit criterion for early stopping.
  4. Aoxue Li*, Tiange Luo*, Tao Xiang, Weiran Huang, Liwei Wang, "Few-Shot Learning with Global Class Representations", ICCV 2019.
    [Paper, Code, Poster]
    In this paper, we propose to tackle the challenging few-shot learning (FSL) problem by learning global class representations using both base and novel class training samples. In each training episode, an episodic class mean computed from a support set is registered with the global representation via a registration module. This produces a registered global class representation for computing the classification loss using a query set. Though following a similar episodic training pipeline as existing meta learning based approaches, our method differs significantly in that novel class training samples are involved in the training from the beginning. To compensate for the lack of novel class training samples, an effective sample synthesis strategy is developed to avoid overfitting. Importantly, by joint base-novel class training, our approach can be easily extended to a more practical yet challenging FSL setting, i.e., generalized FSL, where the label space of test data is extended to both base and novel classes. Extensive experiments show that our approach is effective for both of the two FSL settings.
  5. Chang Xu, Weiran Huang, Hongwei Wang, Gang Wang, Tie-Yan Liu, "Modeling Local Dependence in Natural Language with Multi-Channel Recurrent Neural Networks", AAAI 2019, Oral Paper.
    Recurrent Neural Networks (RNNs) have been widely used in processing natural language tasks and achieve huge success. Traditional RNNs usually treat each token in a sentence uniformly and equally. However, this may miss the rich semantic structure information of a sentence, which is useful for understanding natural languages. Since semantic structures such as word dependence patterns are not parameterized, it is a challenge to capture and leverage structure information. In this paper, we propose an improved variant of RNN, Multi-Channel RNN (MC-RNN), to dynamically capture and leverage local semantic structure information. Concretely, MC-RNN contains multiple channels, each of which represents a local dependence pattern at a time. An attention mechanism is introduced to combine these patterns at each step, according to the semantic information. Then we parameterize structure information by adaptively selecting the most appropriate connection structures among channels. In this way, diverse local structures and dependence patterns in sentences can be well captured by MC-RNN. To verify the effectiveness of MC-RNN, we conduct extensive experiments on typical natural language processing tasks, including neural machine translation, abstractive summarization, and language modeling. Experimental results on these tasks all show significant improvements of MC-RNN over current top systems.
  6. Xiaowei Chen, Weiran Huang, Wei Chen, John C. S. Lui, "Community Exploration: From Offline Optimization to Online Learning", NeurIPS 2018.
    [Paper, Full Version]
    We introduce the community exploration problem that has many real-world applications such as online advertising. In the problem, an explorer allocates limited budget to explore communities so as to maximize the number of members he could meet. We provide a systematic study of the community exploration problem, from offline optimization to online learning. For the offline setting where the sizes of communities are known, we prove that the greedy methods for both of non-adaptive exploration and adaptive exploration are optimal. For the online setting where the sizes of communities are not known and need to be learned from the multi-round explorations, we propose an "upper confidence" like algorithm that achieves the logarithmic regret bounds. By combining the feedback from different rounds, we can achieve a constant regret bound.
  7. Lichao Sun, Weiran Huang, Philip S. Yu, Wei Chen, "Multi-Round Influence Maximization", KDD 2018, Oral Paper (AR: 10.9%).
    [Paper, Full Version, Video]
    In this paper, we study the Multi-Round Influence Maximization (MRIM) problem, where influence propagates in multiple rounds independently from possibly different seed sets, and the goal is to select seeds for each round to maximize the expected number of nodes that are activated in at least one round. MRIM problem models the viral marketing scenarios in which advertisers conduct multiple rounds of viral marketing to promote one product. We consider two different settings: 1) the non-adaptive MRIM, where the advertiser needs to determine the seed sets for all rounds at the very beginning, and 2) the adaptive MRIM, where the advertiser can select seed sets adaptively based on the propagation results in the previous rounds. For the non-adaptive setting, we design two algorithms that exhibit an interesting tradeoff between efficiency and effectiveness: a cross-round greedy algorithm that selects seeds at a global level and achieves 1/2 − ε approximation ratio, and a within-round greedy algorithm that selects seeds round by round and achieves 1 − exp(−(1−1/e)) − ε ≈ 0.46 − ε approximation ratio but saves running time by a factor related to the number of rounds. For the adaptive setting, we design an adaptive algorithm that guarantees 1 − exp(−(1−1/e)) −ε approximation to the adaptive optimal solution. In all cases, we further design scalable algorithms based on the reverse influence sampling approach and achieve near-linear running time. We conduct experiments on several real-world networks and demonstrate that our algorithms are effective for the MRIM task.
  8. Weiran Huang, Jungseul Ok, Liang Li, Wei Chen, "Combinatorial Pure Exploration with Continuous and Separable Reward Functions and Its Applications", IJCAI 2018.
    [Paper, Full Version, Slides]
    We study the Combinatorial Pure Exploration problem with Continuous and Separable reward functions (CPE-CS) in the stochastic multi-armed bandit setting. In a CPE-CS instance, we are given several stochastic arms with unknown distributions, as well as a collection of possible decisions. Each decision has a reward according to the distributions of arms. The goal is to identify the decision with the maximum reward, using as few arm samples as possible. The problem generalizes the combinatorial pure exploration problem with linear rewards, which has attracted significant attention in recent years. In this paper, we propose an adaptive learning algorithm for the CPE-CS problem, and analyze its sample complexity. In particular, we introduce a new hardness measure called the consistent optimality hardness, and give both the upper and lower bounds of sample complexity. Moreover, we give examples to demonstrate that our solution has the capacity to deal with non-linear reward functions.
  9. Weiran Huang, Liang Li, Wei Chen, "Partitioned Sampling of Public Opinions Based on Their Social Dynamics", AAAI 2017.
    [Paper, Full Version, Poster]
    Public opinion polling is usually done by random sampling from the entire population, treating individual opinions as independent. In the real world, individuals’ opinions are often correlated, e.g., among friends in a social network. In this paper, we explore the idea of partitioned sampling, which partitions individuals with high opinion similarities into groups and then samples every group separately to obtain an accurate estimate of the population opinion. We rigorously formulate the above idea as an optimization problem. We then show that the simple partitions which contain only one sample in each group are always better, and reduce finding the optimal simple partition to a well-studied Min-r-Partition problem. We adapt an approximation algorithm and a heuristic to solve the optimization problem. Moreover, to obtain opinion similarity efficiently, we adapt a well-known opinion evolution model to characterize social interactions, and provide an exact computation of opinion similarities based on the model. We use both synthetic and real-world datasets to demonstrate that the partitioned sampling method results in significant improvement in sampling quality and it is robust when some opinion similarities are inaccurate or even missing.