A Shenzhen-based artificial intelligence (AI) startup is creating ripples in the global tech sphere with an innovative language model.
This open-source large language model (LLM), known as RWKV, is likened to the Android operating system for smartphones by industry insiders. It merges the strengths of Transformer and RNN (recurrent neural network) architectures, boasting features not present in the widely used Transformer architecture, the basis for OpenAI's ChatGPT. (The Transformer, an early AI model developed in 2017, has been widely recognized for laying the foundation for numerous large language models in use today.)
Luo Xuan (L), COO and co-founder of Shenzhen Yuanshi Intelligence Co., demonstrates a RWKV-based music generating application.
RWKV, which was open-sourced as the world's first non-Transformer 7B model in early 2020, stands out for its attributes like high and consistent inference efficiency, low and constant memory usage, support for unlimited context, and chip-friendly features, enabling it to significantly reduce computational costs compared to traditional Transformer structures.
According to the Chatbot Arena Leaderboard, a benchmark developed by members from LMSYS and UC Berkeley SkyLab for ranking LLMs, RWKV's language model, Raven-14B, surpassed renowned projects like Alpaca-13B and ChatGLM 6B to rank sixth in the leaderboard's May 10, 2023 ratings, only after OpenAI GPT-4, Anthropic Claude-v1, OpenAI GPT-3.5-turbo, Vicuna 13B, and Koala 13B.
A screenshot of Chatbot Arena Leaderboard's ratings in May 2023.
Additionally, RWKV was rated the best at a hackathon (coined out of "hacker" and "marathon," referring to a programming competition) in Shenzhen in April. It has fostered over 400 projects, some of which have obtained vital funding and generated profits by leveraging the RWKV model.
Origin
The story behind RWKV, pronounced "RWaKuV," is one of passion and coincidence. It stemmed from developer Peng Bo's intrigue with AI-generated novels. To tackle challenges in long-text generation, Peng ingeniously reconstructed the RNN architecture into a more efficient framework, marking a turning point in AI model development.
With a degree in physics from the University of Hong Kong, Peng was building up a business in Shenzhen's artificial intelligence of things (AIoT) sector following multiple years in quantitative trading at a Hong Kong hedge fund.
Shortly after RWKV's birth, its exceptional performance quickly captured the attention within the industry. On Feb. 3, Peng received an invitation email from OpenAI, in which the U.S. tech giant expressed interest in RWKV and asked whether Peng would like to work for OpenAI. Driven by his passion for building an open-source AI model to benefit larger groups of people, Peng said in his reply: "OpenAI is great, but I like building open AI; let me know if OpenAI plans to build a community project one day."
Following the Shenzhen hackathon in April, CEO Peng and COO Luo Xuan, an experienced AI professional who has previously orchestrated 15 hackathons, collaborated to establish Shenzhen Yuanshi Intelligence Co., transitioning RWKV towards commercialization.
Growth
With the model now in its sixth iteration, plans are underway for the seventh iteration this year, aiming to expand parameter sizes potentially to a trillion and nurture a robust ecosystem, according to Luo. The company's team of 30, with 40% being part-timers and interns, predominantly focuses on R&D efforts.
"We are not just a tech company. We have a vast ecosystem," Luo told Shenzhen Daily reporters in an exclusive interview. "A large number of developers and companies are using our open-source model. More than 20,000 developers domestically and internationally are voluntarily helping us improve it, and nearly 400 open-source projects are utilizing it."
Having secured seed investment six months into its inception, the company is in talks for additional funding in an angel round of investment, Luo said.
The newly minted company has garnered a diverse clientele across business-to-business and business-to-consumer sectors, ensuring revenue streams since last year. It is collaborating with the State Grid Corp., and their on-device applications, ranging from music generation to novel writing, have also attracted a user base.
"What sets RWKV apart is its efficiency in computing, reducing the need for extensive computing power compared to traditional large model architecture," Luo said. "This efficiency enables RWKV to operate effectively on various devices, from smartphones and PCs to robots and even electric vehicles, incomparable by other architecture."
Challenges
Moving forward, the company plans to expand RWKV applications to sectors like new energy, humanoid robots, and consumer devices. By collaborating with robotics firms in crafting versatile robots adaptable to multiple scenarios, Luo envisions a future where the same humanoid robot can work at a factory, sort out packages at a warehouse, and take up housework responsibilities.
Addressing challenges faced by AI startups, Luo highlighted the escalating costs of computational power used to train AI models. As computational demands increase worldwide, the expenses for the computing power necessary to train AI models have been soaring. According to an analysis from Stanford University's 2024 Artificial Intelligence Index Report, last year, OpenAI's GPT-4 cost an estimated US$78.4 million to train, while Google's AI model, Gemini Ultra, costs a staggering US$191 million.
"Many AI companies found they failed to bring in profits and it was chipmakers that have profited the most from the AI craze," said Luo.
A leading manufacturer of the graphic processing units (GPUs) typically used in AI, NVIDIA briefly overtook Microsoft to become the world's biggest public company June 18. With these two firms and Apple holding market caps of around US$3.2 trillion, in the near term, their rankings within the top three could shift often.
Hou Xiaowan (L), manager of global communications of Shenzhen Yuanshi Intelligence Co., explains to a potential user how to use RWKV Music, an application based on the RWKV large model, to generate music at an event in Shenzhen last month.
Addressing the future landscape of the AI industry, Luo cautioned that reckless expansion on computational power and stocking up on GPUs may incur losses, as new types of chips specially designed for AI models may appear.
Steering away from conventional cloud-dependent models, RWKV's on-device operations are more efficient, secure and cost-effective, according to Luo. These advantages have grabbed the attention of some overseas researchers and leading AI companies, he said.
"Researchers from Stanford University, Massachusetts Institute of Technology and Carnegie Mellon University have discussed RWKV in their recently published papers, and AI giants like Microsoft, Google and Meta have also released language models similar to RWKV," he said.
The Shenzhen company is also collaborating with international companies, particularly cloud platforms and chip manufacturers such as Qualcomm, MediaTek, Intel, and AMD, to enhance its global outreach.
In the evolving AI landscape, Shenzhen emerges as a hub fostering innovation and entrepreneurial spirit, offering unique opportunities for AI startups, Luo said. "Shenzhen has a complete AI ecosystem, encompassing software, hardware, models, cloud services, application development, and global hardware sales, interwoven into a comprehensive ecosystem," he said.
He thinks Shenzhen has the opportunity to win out in this AI wave, by integrating edge models with smart hardware. "Shenzhen's greatest asset, which makes it ideal for entrepreneurs, is the entrepreneurial atmosphere it fosters," he said. "Here, everyone is young, striving daily for a better tomorrow."