Yuxuan Wang 王雨轩

I am currently an undergraduate student at the School of Computer Science and Technology at Beijing Institute of Technology. I am also an intern at Mμ Lab at the Institute of Artificial Intelligence, Peking University, where I focus on model architecture under the guidance of Professor Muhan Zhang. I will begin my master's studies at the School of Software and Microelectronics at Peking University in September 2026, under the supervision of Professor Muhan Zhang and Professor Zhonghai Wu.

News

2025.11 Gave a presentation on MoE experts balancing and training stability. PPT / Video

2025.09 One paper accepted by NeurIPS 2025 Dataset and Benchmark Track! Welcome to evaluate your models with LooGLE-v2!

2024.11 Glad to have joined Mμ Lab at IAI, PKU as a research intern!

2024.10 Honored to receive the National Scholarship!

2023.10 Honored to receive the National Scholarship!

Research

My research interest focuses on foundation models. I am dedicated to model compression, model acceleration, and related areas.

HISA: Efficient Hierarchical Indexing for Fine-Grained Sparse Attention

, 2026

HISA: Efficient Hierarchical Indexing for Fine-Grained Sparse Attention

, 2026

TPLA: Tensor Parallel Latent Attention for Efficient Disaggregated Prefill and Decode Inference

Xiaojuan Tang, Fanxu Meng, Pingzhi Tang, Yuxuan Wang, Di Yin, Xing Sun, Muhan Zhang

arxiv preprint, 2025

arxiv

We propose Tensor-Parallel Latent Attention (TPLA): a scheme that partitions both the latent representation and each head’s input dimension across devices, performs attention independently per shard, and then combines results with an all-reduce. TPLA preserves the benefits of a compressed KV cache while unlocking TP efficiency.

LooGLE v2: Are LLMs Ready for Real World Long Dependency Challenges?

Ziyuan He*, Yuxuan Wang*, Jiaqi Li*, Kexin Liang, Muhan Zhang

NeurIPS Datasets and Benchmarks Track, 2025

arxiv code huggingface website

LooGLE v2 is a benchmark designed to evaluate the long-context and long-dependency capabilities of large language models. Its key highlight is the use of ultra-long texts, with a strong emphasis on long-dependency, and it is entirely designed with real-world, real-task scenarios.