?Sun Maosong's team proposes "Densing Law", revealing inherent trends in efficient development of LLMs-Tsinghua University

A research team led by Professor Sun Maosong, Associate Professor Liu Zhiyuan, and Assistant Research Fellow Han Xu from the Department of Computer Science and Technology at Tsinghua University, in collaboration with the large model open-source community OpenBMB, has proposed "Densing Law" for large language models (LLMs). This law demonstrates that the maximum capability density of LLMs is growing exponentially over time: from February 2023 to April 2025, the maximum capability density of open-source LLMs approximately doubles every 3.5 months. This means that every 3.5 months, a model with half the parameters can achieve performance comparable to current state-of-the-art models. This finding provides a new perspective on understanding LLM development patterns and reveals that LLMs are intrinsically evolving towards greater efficiency.

Since 2020, guided by Scaling Laws, LLMs have continuously increased the scale of their training data and model parameters, achieving significant improvements in natural language understanding, generation, and reasoning tasks. This has driven the emergence of a series of ultra-large-scale models with hundreds of billions of parameters. However, as the training scale expands, the training and inference costs of LLMs have risen sharply. The growth of publicly available data can barely keep pace with the exponentially increasing demands of model training, while computational resource constraints have become major bottlenecks limiting LLM training and deployment. Addressing these challenges requires researchers to explore more sustainable development pathways for LLMs.

This critical development need called for a response. The research team drew inspiration from the density improvement pattern of "Moore's Law." They also based their work on the core hypothesis that "models of different sizes, using the same manufacturing process and being fully trained, should have the same capability density"—and from this, proposed the concept of "Capability Density" for LLMs. This concept is used to evaluate the level of capability contained per unit parameter of a large language model.

The research team designed an evaluation framework for relative "capability density". First, the team selected a series of reference models and established the scaling curve between the parameter size and performance. Based on this, the team set the capability density of the reference models to 1, serving as a baseline for measuring the capability density of other models. The capability density of a given target model is then defined as "the ratio of the parameter size of a reference model with equivalent performance to the parameter size of the target model".

Figure 1. Schematic diagram of the calculation method for "Capability Density"

The team conducted a capability density analysis of 51 open-source LLMs released in recent years. The results show that the maximum capability density of these models follows an exponential growth trend over time, doubling approximately every 3.5 months. This trend reveals the rapid advancement of LLM technology as well as the synergistic progress between AI algorithms and computing power.

Figure 2. The estimated capability density of open-source base LLMs

Based on the Densing Law, the research team derived several important corollaries. First, the parameter size and inference costs required for LLMs to achieve equivalent performance are decreasing exponentially. For example, the inference price per million tokens for GPT-3.5-level models was $20 at the end of 2022 but had dropped to approximately 1/266th of that price by August 2024. Second, since the release of ChatGPT, the growth of capability density has significantly accelerated, with increasingly efficient open-source LLMs being released. Third, the combination of the Densing Law and Moore's Law reveals tremendous potential for edge-device intelligence—as both chip computing power and LLM capability density grow exponentially, edge-devices can run increasingly powerful LLMs, promoting the widespread adoption of edge computing.

The research emphasizes that density optimization represents a critical development pathway for LLMs, stemming from advancements in model architecture, training algorithms, and data processing. The team has already released a series of high-capability-density models for edge devices, including MiniCPM, MiniCPM-V/o, and VoxCPM. This model series has gained widespread recognition in both academia and industry, with related technical papers published in journals such as Nature Communications and Nature Machine Intelligence. All ten releases of these open-source models topped trending lists on HuggingFace and GitHub and were selected for HuggingFace's 2024 list of the world's most popular and downloaded open-source models.

The research findings, titled "Densing Law of LLMs," were published as the cover article in the November 20th issue of Nature Machine Intelligence, a subsidiary journal of Nature.

Figure 3. The research findings published as the cover article in Nature Machine Intelligence

Dr. Xiao Chaojun, a postdoctoral researcher in the Department of Computer Science and Technology at Tsinghua University, is the first author of the paper. Han Xu, Liu Zhiyuan, and Sun Maosong serve as the corresponding authors. The research was supported by the National Natural Science Foundation of China, the Beijing Municipal Science and Technology Plan Project, the China National Postdoctoral Program for Innovative Talents, and the Shuimu Tsinghua Scholar Program.

Full article: https://www.nature.com/articles/s42256-025-01137-0

Editor：Li Han