San Francisco-based AI startup Bland has unveiled Bland TTS, a new text-to-speech tool it claims is the first to “cross the uncanny valley” of voice AI. The company says its model can generate stunningly realistic voices that mimic tone, cadence, emotion, and pronunciation from just one short MP3 clip.
“Several months ago, our team solved one-shot style transfer of human speech,” Bland wrote in a post on X (formerly Twitter). “That means, from a single, brief MP3, you can clone any voice or remix another clone’s style.”
Designed for creators, developers, and enterprises, Bland TTS offers high levels of control over vocal style and emotion, with use cases spanning AI-generated soundtracks, automated customer support lines, and immersive applications in games and films. The API is now available for developers to build custom voice experiences, while businesses can use it to create AI-powered helplines that “sound so natural — customers will save you as a contact,” according to Bland.
A viral demo has triggered widespread reactions, ranging from admiration to unease. X user Dom Lucre posted, “This new AI company called ‘Bland’ is going viral after releasing this video showing their futuristic and ‘intimidating’ AI… able to capture your soul with a brief MP3 file.”
While the technology has drawn applause from parts of the creative and developer community, it has also ignited serious concerns about misuse. With voice cloning already flagged as a tool for scams and deepfake-driven fraud, the ease with which Bland’s tech can mimic speech has prompted criticism.
One X user, @zazzygfx, questioned, “How do we ensure this tech doesn’t fall into the wrong hands?” Another, @CancelEverythin, responded bluntly: “What a BS way of saying you’re doing nothing to prevent this being used to rob people’s banks via their voice IDs.”
In response, Bland acknowledged the risks. “We have pretty hardcore safety checks,” the company said, “but it’s something that we all need to start talking about.”