Illustration of the Xunzi artificial intelligence large language model Photo: njau.edu.cn
A college research team from East China’s Jiangsu Province has recently released China’s first large language model (LLM), a type of artificial intelligence (AI) algorithm that uses deep learning techniques and massively big data sets to help conduct research on Chinese ancient books.
The LLM for ancient books was designed to intelligently process ancient texts, promote innovative development in the research and preservation of Chinese ancient books, enhance the efficiency and quality of the inheritance of traditional Chinese culture, and facilitate deep integration between LLMs and the processing of ancient books.
The LLM “Xunzi,” named after Xun Zi, one of the most famous philosophers in ancient China for his Confucian classic Xunzi, contains the vast majority of Chinese ancient books and documents including the collections of the “Complete Library in Four Sections” or “Siku Quanshu,” with a large-scale corpus of over 2 billion Chinese characters and words.
The research on Chinese traditional classics is a painstaking and laborious work even for scholars and experts, let alone for average learners. Thus, translating ancient texts into modern Chinese is one of its most important functions, Wang Dongbo, professor from College of Information Management of Nanjing Agricultural University in Nanjing, Jiangsu, who led the research team told the Global Times.
With the model, researchers can swiftly summarize the ancient texts and know about the themes of the ancient books. The model can also extract key information from the ancient texts, such as characters, events and places, to sort out the information with efficiency.
Besides, the model can also automatically generate ancient poems that comply with grammar and prosody rules with the prompts the users give to it to provide inspiration for poetry lovers. It can also precisely translate ancient texts into modern Chinese to help researchers understand the original meaning and connotation of ancient texts.
Led by Wang, the research team has been working in the area of digitization of ancient books and documents for a decade. Supported by the presence of the university’s strong computing power and based on the application scenarios provided by Zhonghua Book Company, the research team accomplished China’s first open-source LLM for ancient texts in AI.
The LLM has been published on websites such as github.com and modelscope.cn as open-source software, allowing users to download and use it for free.
"We trained Xunzi using big data built on ancient books which can be obtained for free on the internet just like the way OpenAI trained ChatGPT. Although we spent great effort, labor force and money into it, we still share it for free with the aim to encourage more people to study and pay attention to traditional Chinese culture,” Wang said.