(Image source: Photo by TMTPost App editor Lin Zhijia)
TMTPOST -- AI large model unicorn MiniMax will release a realtime API service in November, comparable to GPT-4o released in May. This will enhance end-to-end real-time multimodal processing capabilities and offer lower latency, more natural, and immersive real-time voice conversations, providing services for various scenarios such as enterprise collaboration, social networking, live streaming, and gaming.
This is MiniMax's first end-to-end real-time voice conversation product. Insiders told TMTPost App that they are refining this product internally and are very eager for the product's performance to directly compete with OpenAI GPT-4o upon its release in November.
GPT-4o, launched by OpenAI, is available for free and can perform real-time audio, visual, and text reasoning. It can respond to audio inputs in as little as 232 milliseconds, with an average of 320 milliseconds, matching human reaction speed in conversations. In terms of API usage, compared to the GPT-4-turbo released last November, GPT-4o's price is reduced by half (50%), and its speed is doubled (200%).
OpenAI CEO Sam Altman revealed in a tweet that the new GPT-4o is the best model OpenAI has ever created. It is intelligent, fast, natively multimodal, and available to all ChatGPT users, whether on the free version or the paid GPT-4 version.
In October this year, Agora, a real-time voice technology company and a sister company of Agora, appeared as a voice API collaborator in the public beta of OpenAI's Realtime API. MiniMax also saw an opportunity and began collaborating with Agora. Zhao Bin, the founder and CEO of Agora, stated at the RTE 2024 10th Real-Time Internet Conference that Agora and MiniMax are refining China's first Realtime API. Products based on this API can engage in easy and smooth real-time voice communication with humans.
In addition to MiniMax, other Chinese companies such as iFlytek, Zhipu AI, and SenseTime are also developing generative AI dialogue products, all of which are comparable in performance to GPT-4o. OpenAI has recently also opened the ChatGPT-4o dialogue function.
According to statistics from iResearch, the market size of conversational AI was 4.5 billion yuan in 2021, driving a scale of 12.6 billion yuan. It is expected that by 2026, the market size of conversational AI will reach 10.8 billion yuan, driving a scale of over 38.5 billion yuan, with a five-year compound annual growth rate (CAGR) of 32.5%.
(Author|Lin Zhijia, Editor|Hu Runfeng)