Weiran Huang (Chinese: 黄维然) is currently a researcher at the AI Theory Group of Noah's Ark Lab. He received his PhD degree in computer science from the Institute for Interdisciplinary Information Sciences (IIIS), Tsinghua University, under the supervision of Prof. Andrew C. Yao and Wei Chen. He was also a visiting scholar at Harvard University hosted by Prof. Yaron Singer. Before that, he got his BSc degree from the Department of Electronic Engineering, Tsinghua University. In 2007, he won a gold medal in the 24th Chinese Physics Olympiad (CPhO). His work has been published on top conferences such as NeurIPS, ICCV, KDD, AAAI, IJCAI, etc.
- Learning Theory (providing insights for algorithms)
- Meta Learning, Few-Shot Learning
- Online Learning, Incremental Learning
- Conference Reviewer: NeurIPS 2019, ASONAM 2019 (PC), COLT 2019, WINE 2017, NIPS 2016.
- Journal Reviewer: JASIST.
- Current: Aoxue Li (PKU).
- Past: Yihong Chen (UCL), Shifeng Zhang (Tsinghua), Yue Liu (PKU), Xingqiu He (UESTC), Yimin Huang (PKU), Junyang Li (PKU).
- Yimin Huang, Weiran Huang, Liang Li, Zhenguo Li, "Meta-Learning PAC-Bayes Priors in Model Averaging", AAAI 2020.
- Jiacheng Sun, Xiangyong Cao, Hanwen Liang, Weiran Huang, Zewei Chen, Zhenguo Li, "Some New Interpretations of Normalization Methods", AAAI 2020.
- Hanwen Liang*, Shifeng Zhang*, Jiacheng Sun, Xingqiu He, Weiran Huang, Kechen Zhuang, Zhenguo Li, "DARTS+: Improved Differentiable Architecture Search with Early Stopping", arXiv preprint arXiv:1909.06035 (2019).
[Paper]Recently, there has been a growing interest in automating the process of neural architecture design, and the Differentiable Architecture Search (DARTS) method makes the process available within a few GPU days. In particular, a hyper-network called one-shot model is introduced, over which the architecture can be searched continuously with gradient descent. However, the performance of DARTS is often observed to collapse when the number of search epochs becomes large. Meanwhile, lots of "skip-connects" are found in the selected architectures. In this paper, we claim that the cause of the collapse is that there exist cooperation and competition in the bi-level optimization in DARTS, where the architecture parameters and model weights are updated alternatively. Therefore, we propose a simple and effective algorithm, named "DARTS+", to avoid the collapse and improve the original DARTS, by "early stopping" the search procedure when meeting a certain criterion. We demonstrate that the proposed early stopping criterion is effective in avoiding the collapse issue. We also conduct experiments on benchmark datasets and show the effectiveness of our DARTS+ algorithm, where DARTS+ achieves 2.32% test error on CIFAR10, 14.87% on CIFAR100, and 23.7% on ImageNet. We further remark that the idea of "early stopping" is implicitly included in some existing DARTS variants by manually setting a small number of search epochs, while we give an explicit criterion for early stopping.
- Aoxue Li*, Tiange Luo*, Tao Xiang, Weiran Huang, Liwei Wang, "Few-Shot Learning with Global Class Representations", ICCV 2019.
[Paper, Project, Poster]In this paper, we propose to tackle the challenging few-shot learning (FSL) problem by learning global class representations using both base and novel class training samples. In each training episode, an episodic class mean computed from a support set is registered with the global representation via a registration module. This produces a registered global class representation for computing the classification loss using a query set. Though following a similar episodic training pipeline as existing meta learning based approaches, our method differs significantly in that novel class training samples are involved in the training from the beginning. To compensate for the lack of novel class training samples, an effective sample synthesis strategy is developed to avoid overfitting. Importantly, by joint base-novel class training, our approach can be easily extended to a more practical yet challenging FSL setting, i.e., generalized FSL, where the label space of test data is extended to both base and novel classes. Extensive experiments show that our approach is effective for both of the two FSL settings.
- Chang Xu, Weiran Huang, Hongwei Wang, Gang Wang, Tie-Yan Liu, "Modeling Local Dependence in Natural Language with Multi-Channel Recurrent Neural Networks", AAAI 2019. Oral paper.
[Paper]Recurrent Neural Networks (RNNs) have been widely used in processing natural language tasks and achieve huge success. Traditional RNNs usually treat each token in a sentence uniformly and equally. However, this may miss the rich semantic structure information of a sentence, which is useful for understanding natural languages. Since semantic structures such as word dependence patterns are not parameterized, it is a challenge to capture and leverage structure information. In this paper, we propose an improved variant of RNN, Multi-Channel RNN (MC-RNN), to dynamically capture and leverage local semantic structure information. Concretely, MC-RNN contains multiple channels, each of which represents a local dependence pattern at a time. An attention mechanism is introduced to combine these patterns at each step, according to the semantic information. Then we parameterize structure information by adaptively selecting the most appropriate connection structures among channels. In this way, diverse local structures and dependence patterns in sentences can be well captured by MC-RNN. To verify the effectiveness of MC-RNN, we conduct extensive experiments on typical natural language processing tasks, including neural machine translation, abstractive summarization, and language modeling. Experimental results on these tasks all show significant improvements of MC-RNN over current top systems.
- Xiaowei Chen, Weiran Huang, Wei Chen, John C. S. Lui, "Community Exploration: From Offline Optimization to Online Learning", NeurIPS 2018.
[Paper, Full Version]We introduce the community exploration problem that has many real-world applications such as online advertising. In the problem, an explorer allocates limited budget to explore communities so as to maximize the number of members he could meet. We provide a systematic study of the community exploration problem, from offline optimization to online learning. For the offline setting where the sizes of communities are known, we prove that the greedy methods for both of non-adaptive exploration and adaptive exploration are optimal. For the online setting where the sizes of communities are not known and need to be learned from the multi-round explorations, we propose an "upper confidence" like algorithm that achieves the logarithmic regret bounds. By combining the feedback from different rounds, we can achieve a constant regret bound.
- Lichao Sun, Weiran Huang, Philip S. Yu, Wei Chen, "Multi-Round Influence Maximization", KDD 2018. Oral paper, AR: 10.9%.
[Paper, Full Version, Video]In this paper, we study the Multi-Round Influence Maximization (MRIM) problem, where influence propagates in multiple rounds independently from possibly different seed sets, and the goal is to select seeds for each round to maximize the expected number of nodes that are activated in at least one round. MRIM problem models the viral marketing scenarios in which advertisers conduct multiple rounds of viral marketing to promote one product. We consider two different settings: 1) the non-adaptive MRIM, where the advertiser needs to determine the seed sets for all rounds at the very beginning, and 2) the adaptive MRIM, where the advertiser can select seed sets adaptively based on the propagation results in the previous rounds. For the non-adaptive setting, we design two algorithms that exhibit an interesting tradeoff between efficiency and effectiveness: a cross-round greedy algorithm that selects seeds at a global level and achieves 1/2 − ε approximation ratio, and a within-round greedy algorithm that selects seeds round by round and achieves 1 − exp(−(1−1/e)) − ε ≈ 0.46 − ε approximation ratio but saves running time by a factor related to the number of rounds. For the adaptive setting, we design an adaptive algorithm that guarantees 1 − exp(−(1−1/e)) −ε approximation to the adaptive optimal solution. In all cases, we further design scalable algorithms based on the reverse influence sampling approach and achieve near-linear running time. We conduct experiments on several real-world networks and demonstrate that our algorithms are effective for the MRIM task.
- Weiran Huang, Jungseul Ok, Liang Li, Wei Chen, "Combinatorial Pure Exploration with Continuous and Separable Reward Functions and Its Applications", IJCAI 2018.
[Paper, Full Version, Slides]We study the Combinatorial Pure Exploration problem with Continuous and Separable reward functions (CPE-CS) in the stochastic multi-armed bandit setting. In a CPE-CS instance, we are given several stochastic arms with unknown distributions, as well as a collection of possible decisions. Each decision has a reward according to the distributions of arms. The goal is to identify the decision with the maximum reward, using as few arm samples as possible. The problem generalizes the combinatorial pure exploration problem with linear rewards, which has attracted significant attention in recent years. In this paper, we propose an adaptive learning algorithm for the CPE-CS problem, and analyze its sample complexity. In particular, we introduce a new hardness measure called the consistent optimality hardness, and give both the upper and lower bounds of sample complexity. Moreover, we give examples to demonstrate that our solution has the capacity to deal with non-linear reward functions.
- Weiran Huang, Liang Li, Wei Chen, "Partitioned Sampling of Public Opinions Based on Their Social Dynamics", AAAI 2017.
[Paper, Full Version, Poster]Public opinion polling is usually done by random sampling from the entire population, treating individual opinions as independent. In the real world, individuals’ opinions are often correlated, e.g., among friends in a social network. In this paper, we explore the idea of partitioned sampling, which partitions individuals with high opinion similarities into groups and then samples every group separately to obtain an accurate estimate of the population opinion. We rigorously formulate the above idea as an optimization problem. We then show that the simple partitions which contain only one sample in each group are always better, and reduce finding the optimal simple partition to a well-studied Min-r-Partition problem. We adapt an approximation algorithm and a heuristic to solve the optimization problem. Moreover, to obtain opinion similarity efficiently, we adapt a well-known opinion evolution model to characterize social interactions, and provide an exact computation of opinion similarities based on the model. We use both synthetic and real-world datasets to demonstrate that the partitioned sampling method results in significant improvement in sampling quality and it is robust when some opinion similarities are inaccurate or even missing.