智東西
作者 王涵
編輯 漠影
智東西8月7日報道,今天,MiniMax推出新一代語音生成模型Speech 2.5。
相比5月發(fā)布的Speech 02,Speech 2.5有三大新突破:多語種表現(xiàn)更自然、音色復刻更像、40個語種覆蓋更廣。
目前,Speech 2.5已全球上線,用戶可以登錄MiniMax開放平臺或MiniMax Audio官網(wǎng)體驗:
MiniMax開放平臺:minimaxi.com/platform_overview
MiniMax Audio:minimaxi.com/audio
▲Speech 2.5主頁
用戶可以在Speech 2.5主頁選擇想要的音色,在對話框內輸入文字描述,也可以上傳文件,就可以一鍵生成所需音頻。下文呈現(xiàn)了官方公布的Speech 02生成音頻的Demo和智東西實測案例:
一、多語種自然表達,減小機械感
MiniMax Speech 2.5提高了生成音頻的相似度和自然韻律度,降低了字錯率、減小了AI生成的商務會議、日常對話、英文播客的機械感。
智東西實測,其還可以給音頻添加場景氛圍音,例如美國女高中生在廣播中演講:
https://oss.zhidx.com/c8ca347142979c4c9b0275ec7c968c4e/68937c00/uploads/2025/08/68947cbca223f_68947cbc9335e_68947cbc9332e_%E7%BE%8E%E5%9B%BD%E5%B0%91%E5%A5%B3%E5%B9%BF%E6%92%AD%E6%BC%94%E8%AE%B2.mp3
音頻內容:Two years is nothing, but at the same time a lot can be accomplished in two years. You can try a sport you’ve always wanted to start, and become great at it. You can start a morning routine and affect your mood and stress at a deep level. You can meditate for a few minutes per day, become more self-aware and change the way you react to problems. You can start a business and make it a big success.
生成的音頻不但可以清晰準確地念出文字,還有母語者很地道的停頓、語調。
立下復仇誓言的哈姆雷特:
https://oss.zhidx.com/8bf418d721c03169570590d39c883d98/68937c00/uploads/2025/08/689452e5721a9_689452e56e7c6_689452e56e79a_%E7%AB%8B%E4%B8%8B%E5%A4%8D%E4%BB%87%E8%AA%93%E8%A8%80%E7%9A%84%E5%93%88%E5%A7%86%E9%9B%B7%E7%89%B9.mp3
音頻內容:Remember? Yea, from the tables of my memory, I’ll wipe away all trivial fond records. All saws of books, all forms, all pressures past, that youth and observation copied there. And then commandment all alone shall live within the book and volume of my brain, unmixed with baser matter. Yes, yes by heaven.
再比如,充滿激情的西班牙體育賽事解說員:
https://oss.zhidx.com/5483bff4b111fd564b365c075fc2828d/68937c00/uploads/2025/08/68947e6f5f179_68947e6f5bba0_68947e6f5bb70_%E5%85%85%E6%BB%A1%E6%BF%80%E6%83%85%E7%9A%84%E8%A5%BF%E7%8F%AD%E7%89%99%E4%BD%93%E8%82%B2%E8%B5%9B%E4%BA%8B%E8%A7%A3%E8%AF%B4%E5%91%98.mp3
音頻內容:?Arranca el genio por la derecha, deja atrás a uno, se saca de encima al segundo, entra al área, prepara el remate…?GOLAZO MONUMENTAL! ?Una obra de arte que sella la victoria y desata la locura total!
二、跨語種復刻口音,還原聲線
Speech 2.5還可以跨語種復刻口音,保留同語種不同地區(qū)的口音,還能保留特殊年齡的聲線特點,用戶可以自由選擇自己想要的音色。
智東西實測,用霸道總裁的聲線說甄嬛傳中皇上的經(jīng)典臺詞:
https://oss.zhidx.com/fe787f468b25b4ed933658e514374a84/68937c00/uploads/2025/08/689478c44acd9_689478c445a5a_689478c445a19_%E9%9C%B8%E9%81%93%E6%80%BB%E8%A3%81%E5%A3%B0%E7%BA%BF%E8%AF%B4%E7%94%84%E5%AC%9B%E4%BC%A0%E5%8F%B0%E8%AF%8D.mp3
音頻內容:嬛嬛一裊楚宮腰,那更春來香減玉消。紫禁城的風水養(yǎng)人,必不會叫你玉減香消。
用英國女王的經(jīng)典發(fā)音來介紹最新的Speech 2.5會是什么樣?
https://oss.zhidx.com/ea597c9308a5660833ba2d22bb7d9b35/68937c00/uploads/2025/08/68945359ae9bb_68945359a911a_68945359a90c6_%E8%8B%B1%E5%9B%BD%E5%A5%B3%E7%8E%8B%E7%9A%84%E7%BB%8F%E5%85%B8%E5%8F%91%E9%9F%B3.mp3
音頻內容:Hello everyone. We’re thrilled to introduce the next generation of our voice model: MiniMax Speech 2.5. Building on its predecessor, Speech 2.0, this new version is more powerful than ever. But where it truly shines is in its incredible realism. The model masterfully captures the subtle nuances of the human voice——from trailing intonation and vocal style, to the full spectrum of emotion, all reproduced with stunning authenticity.
從停頓、節(jié)奏、到發(fā)音處理,模型生成的語音保持了純正的“女王腔”。
跨語種復刻也可以辦到,智東西讓Speech 2.5用熱血韓漫男主的音色說“美美桑內”歌詞,在韓語和英語中切換:
https://oss.zhidx.com/a64f1d5f97c6f960bb11d0eb655ac21f/68937c00/uploads/2025/08/68947938d5422_68947938cfa2f_68947938cf9fa_%E7%83%AD%E8%A1%80%E9%9F%A9%E6%BC%AB%E7%94%B7%E4%B8%BB%E5%A3%B0%E7%BA%BF%E8%AF%B4%E2%80%9C%E7%BE%8E%E7%BE%8E%E6%A1%91%E5%86%85%E2%80%9D.mp3
音頻內容:???? ??,???? ??,never stop burn it,? ?? ??? oh you know?
同一音色在意大利語、英語間的切換:
https://oss.zhidx.com/6233b4848810ccc8579a8afa1f63daaf/68937c00/uploads/2025/08/6894537784b59_689453777ef17_689453777eee9_%E6%84%8F%E5%A4%A7%E5%88%A9%E8%AF%AD%E8%8B%B1%E8%AF%AD%E5%88%87%E6%8D%A2.mp3
音頻內容:Questa è la mia vera voce. I find speaking English a bit difficult. It’s like trying to speak Italian without using hand gestures.
在不同的語言中切換,Speech 2.5生成的內容依舊可以保留口音特色細節(jié)。
三 、新增多個小語種,語種類型增至40個
Speech 2.5新增了保加利亞語、丹麥語、希伯來語、馬來語、波斯語、斯洛伐克語等多個小語種,語種類型擴充到了40個??缇畴娚?、出??头?、本地化營銷,全球化內容可以一鍵創(chuàng)作。
比如馬來語:
https://oss.zhidx.com/4d20b9aeff9f8db4af88a605506328d6/68937c00/uploads/2025/08/689453a561e81_689453a55c370_689453a55c348_%E9%A9%AC%E6%9D%A5%E8%AF%AD.mp3
音頻內容:Selamat datang, semoga hari anda indah.
希伯來語:
https://oss.zhidx.com/4faac7c2307610b77b2fa04b7f719d27/68937c00/uploads/2025/08/689453b42355d_689453b41f914_689453b41f8e2_%E5%B8%8C%E4%BC%AF%E6%9D%A5%E8%AF%AD.mp3
音頻內容:.?????? ??????? ???? ???
四、促進跨境業(yè)務,喜馬拉雅、網(wǎng)易都用了
MiniMax Speech語音模型可以應用在多種場景下,例如多語種客服、跨國廣告配音、跨國教育、跨境電商等。
目前,MiniMax Speech語音模型已在全球被廣泛采用。在海外,Vapi、Pipecat等Agent平臺選擇使用MiniMax Speech提供服務,Hedra、Icon、Syllaby等頭部AI應用也已接入MiniMax Speech。
國內,高途教育、喜馬拉雅、網(wǎng)易、Rokid眼鏡等頭部平臺及產(chǎn)品都選擇了MiniMax Speech。
結語:MiniMax在AI音頻賽道繼續(xù)深耕
MiniMax在AI音頻賽道并非初出茅廬,其今年5月發(fā)布的Speech 02在Artificial Analysis和Hugging Face TTS Arena兩項語音基準測評榜單中超越 OpenAI、ElevenLabs等知名模型,獲得雙料第一。
Speech 2.5可視為Speech 02的進階版本,在繼承前代優(yōu)勢的基礎上,進一步在多語種、音色復刻及語種覆蓋上深入優(yōu)化。
當下,眾多企業(yè)和研究機構紛紛布局,AI音頻賽道競爭愈發(fā)激烈,MiniMax Speech 2.5的發(fā)布為市場注入了新的活力。
特別聲明:以上內容(如有圖片或視頻亦包括在內)為自媒體平臺“網(wǎng)易號”用戶上傳并發(fā)布,本平臺僅提供信息存儲服務。
Notice: The content above (including the pictures and videos if any) is uploaded and posted by a user of NetEase Hao, which is a social media platform and only provides information storage services.