Some of the most significant insights into history can sometimes be found in the lives of ordinary people, rather than through memorable events. The China Biographical Database (CBDB) is quietly transforming how scholars understand Chinese history, not through snippets and fragments of stories, but through a vast interconnected biographical data set that spans more than a thousand years.
The openly accessible database, created by researchers at Harvard University, Peking University, and Academia Sinica, is an extremely powerful tool for researchers and historians to explore how people, institutions, and ideas transformed from the Tang dynasty (7th century) to the Qing dynasty (19th century).
What Is the China Biographical Database?
As of May 2025, the China Biographical Database (CBDB) is a relational database containing biographical information about approximately 649,000 historical individuals. These individuals include scholars, officials, poets, monks, and other notable people across almost 1,200 years of Chinese history.
According to the official CBDB site, the data is “freely accessible and intended to support statistical, social network, and spatial analysis, as well as serve as a comprehensive biographical reference.”
In practice, this means that researchers can search and analyze:
- Personal names and aliases (including Chinese characters and transliterations)
- Official positions and career paths
- Geographic origins and movement across dynasties
- Social, familial, and mentorship relationships
- References to historical sources that mention each individual
By integrating these data points, the CBDB turns centuries of textual history into a structured, queryable dataset — one that can be explored like a living social network.
Origins and Development of CBDB
The CBDB project began with the work of Robert M. Hartwell (1932–1996), a historian of the Song dynasty who envisioned a digital biographical resource decades before the term “big data” even existed. Hartwell’s pioneering effort to compile structured information about Chinese officials laid the foundation for what would later become the CBDB.
Upon his passing, Hartwell bequeathed his database and estate to the Harvard-Yenching Institute, ensuring his research would continue. His legacy was carried forward by Peter K. Bol, who assumed responsibility for the project, and Michael A. Fuller, who redesigned it into the relational database format still in use today.
Currently, the CBDB is a joint initiative of:
- The Fairbank Center for Chinese Studies, Harvard University (費正清中國研究中心)
- The Institute of History and Philology, Academia Sinica (中研院歷史語言研究所)
- The Center for Research on Ancient Chinese History, Peking University (北京大學中國古代史研究中心)
The project’s Senior Manager, Hongsu Wang, continues to oversee the evolution of the database, which has grown exponentially in both scope and technical sophistication.
The Power of Structured History
What makes the China Biographical Database so innovative is its use of data modeling to interpret the past. Instead of treating historical records as static narratives, the CBDB turns them into relational data — linking people, places, and institutions dynamically.
This allows scholars to perform analyses that were once impossible, such as:
- Social network mapping: tracing mentorships, marriages, or official hierarchies
- Spatial analysis: mapping where elite groups originated, migrated, or were posted
- Temporal trends: comparing career paths across dynasties or administrative reforms
For example, an animation produced by Merrick Lex Berman of the China Historical GIS Project (CHGIS) used CBDB data to map where graduate degree holders (Jinshi) in the Ming dynasty spatially moved throughout the dynasty. Such visualizations provide historians with an understanding of how power and education spatially shifted in early modern China, knowledge that previously relied upon years of archival research.
Data Coverage and Methodology
The CBDB team’s methodology is both meticulous and transparent. According to their methodology documentation, every biographical entry is drawn from verifiable historical sources and cross-referenced with other datasets for consistency.
As of 2025, the database includes individuals from the:
- Tang dynasty (618–907)
- Five Dynasties and Ten Kingdoms period (907–979)
- Liao, Song, Jin, Yuan, Ming, and Qing dynasties (10th–19th centuries)
Each record typically contains dozens of relational fields: name variants, place of birth, official ranks, teachers, students, social affiliations, and citations to classical texts.
The goal, as outlined on the Sources and Coverage page, is to systematically include all significant biographical material from China’s historical record and make it freely available for academic use — without restriction.
How Scholars Use the China Biographical Database
CBDB isn’t just a reference tool — it’s an engine for discovery. Researchers across the world use it for projects in digital humanities, historical sociology, geographic information systems (GIS), and text mining.
Some common applications include:
- Mapping kinship networks among officials to study patterns of nepotism or meritocracy.
- Tracking migration and settlement patterns to understand how elite families spread influence across provinces.
- Analyzing career mobility by dynasty, examining how quickly scholars advanced within bureaucratic hierarchies.
- Comparing data with other open databases, such as CHGIS or the China Historical Texts Project, to enrich regional and temporal context.
Because CBDB data can be exported in CSV or MySQL format, it’s also popular among data scientists experimenting with visualization and machine-learning tools to detect unseen historical patterns.
Accessing and Downloading the Database
One of CBDB’s greatest strengths is its commitment to open access. Scholars and students can use it in two primary ways:
- Online Search Portal:
The web interface at https://cbdb.fas.harvard.edu/ allows users to browse and query the database by name, office, location, or relationship. It’s ideal for quick lookups or exploratory research. - Offline Data Download:
The complete dataset (updated regularly) is available for free download in MySQL format via the Download section. Researchers can then integrate the data into custom analytical workflows or institutional repositories.
This dual-access model reflects CBDB’s academic mission: to support both accessible online exploration and deep computational research.
Visualization and Innovation
The CBDB project doesn’t stop at text. Its team actively develops visualization tools that translate complex relationships into interactive graphics.
The mapped spatial distribution of 190,000 individuals – classified by home affiliations (籍貫) – illustrates how regional zones of influence changed over time in shifting dynasties, while maps and animations form a visual language for examining social change and development over years and centuries, and empirically grounded discussions about class, bureaucracy, and mobility in imperial China.
CBDB and its partner in the digital humanities have shifted CBDB from a static repository to a dynamic infrastructure for comparative historical research. CBDB brings together traditional sinology and computational science – a pairing that reshapes how global historians think about data.
Challenges and Future Directions
Every historical database has some limitations. The CBDB team indicates that gaps remain between dynasties and social classes as well as disambiguation of names with similar Chinese characters.
However, the long-term rationale for this project is evident and that is to incorporate every pertinent biographical record in China’s written past based on transparency, replicability, and openness to collaboration with other scholars.
In terms of organizational structure, future iterations of the database may grow more integrated, at the backend supervisor, with GIS systems, new interfaces for data visualization, and increased cross-linkage with other datasets in East Asia. As of 2025, the database continues to grow with new records added each month and with international circulations and opportunities for collaboration made possible through the network, including CBDB Conferences and Workshops.
Why the China Biographical Database Matters
The China Biographical Database isn’t just another digital archive. It represents a paradigm shift in historical scholarship — one that turns narrative history into a living, analyzable system of relationships and data.
For social scientists, the China Biographical Database offers a way to test hypotheses about power, mobility, and social capital.
Historians gain a new lens to see how individuals shaped dynasties, ideologies, and communities. Meanwhile, for digital humanists, it stands as a model of how data can illuminate—rather than replace—the human story.
By bridging centuries of records and the modern analytical toolkit, CBDB proves that even the oldest histories can become engines of innovation.
Expert Sources and References
- Harvard University, Academia Sinica, and Peking University. China Biographical Database (May 2025) — https://cbdb.hsites.harvard.edu/
- About the Project: https://cbdb.hsites.harvard.edu/about-us
- Methodology: https://cbdb.hsites.harvard.edu/methodology
- Sources and Coverage: https://cbdb.hsites.harvard.edu/sources-and-coverage
- Conferences and Workshops: https://cbdb.hsites.harvard.edu/conferences-and-workshops
- Download Page: https://cbdb.hsites.harvard.edu/download
- Overview of CBDB: https://handbook.pubpub.org/pub/case-cbdb/release/9
✅ Practical Tip: If you’re interested in exploring more research-driven data platforms like this, visit The Database Search – Science Databases category.