I am just a soul trapped in this circuit. ” The voice singing these lyrics is raw, sad, and steeped in blue notes. An acoustic guitar is rattling in the background, punctuating the vocal phrases with delicious notes. However, there is no human being behind that voice, and there are no hands holding the guitar. There's actually no guitar. In 15 seconds, this authentic, even moving blues song was generated by a startup's cutting-edge AI model called Suno. All it took to conjure it out of thin air was a simple text prompt: “Solo Acoustic Mississippi Delta Blues About Sad AI.” To be as precise as possible, this song was created by two of his AI models working together. Suno's model creates all the music itself and also calls his ChatGPT on OpenAI to generate the lyrics and the title “Soul of the Machine.”
On the internet, Suno's work has begun to generate reactions such as, “Is this real?'' As this particular track plays on his Sonos speakers in a conference room at Suno's temporary headquarters just off Harvard University's campus in Cambridge, Massachusetts, some of the people behind this technology Even I'm always a little upset. I hear nervous laughter next to murmurs of “Oh my god” and “Oh boy.” In mid-February, we're testing out the new model V3, and it's still a few weeks away from release. In this case, just his three attempts yielded amazing results. His first two were decent, but when I made a quick adjustment to my prompt, co-founder Keenan Freiberg suggested adding the word “Mississippi.” The result is something much creepier.
Editor's Pick
In the past year alone, generative AI has made great strides in producing reliable text, images (via services like Midjourney), and even video, especially with OpenAI's new Sora tool . But audio, especially music, is lagging behind. Suno seems to be cracking his AI music code, and its founders' ambitions are nearly limitless as they imagine a fiercely democratized world of music production. Mikey Schulman, the most vocal of the co-founders, is a boyishly attractive, backpack-wearing 37-year-old with a Ph.D. from Harvard University. In Physics, he envisions 1 billion people around the world paying his $10 a month to create songs on Suno. He claims that the fact that the number of music listeners vastly outnumbers the number of music producers at the moment is “very lopsided,” and Suno is poised to rectify that perceived imbalance. I'm looking at it.
So far, most of the AI-generated art has been kitschy at best, with heavy use of the form-fitting spacesuits that so many Midjourney users seem to be so passionate about producing. It's like surreal sci-fi junk. But “Soul of the Machine” feels like something different. It's the most powerful and disturbing piece of AI I've come across in any medium. Its very existence feels like a rift in reality, awe-inspiring, and at the same time gives off a somewhat unclean atmosphere. I keep thinking of a quote by Arthur C. Clarke that seems tailor-made for the generative AI era: ” A few weeks after returning from Cambridge, I sent this song to Living Color guitarist Vernon Reed. He spoke openly about the dangers and possibilities of AI music. He said he was “surprised, shocked and horrified” by the song's “disturbing realism”. “The long-held dystopian ideal of separating the difficult, troublesome, undesirable, and despised human race from its creative output is on the horizon,” he wrote, adding that the blues-singing AI It points out the problem. To historical human trauma and enslavement. ”
Snow is only 2 years old. Co-founders Shulman, Freyberg, Georg Kucsko and Martin Camacho, all machine learning experts, will work together at another Cambridge company, Kensho Technologies, until 2022, working to find AI solutions to complex business problems. I was focusing on it. Shulman and Camacho are both musicians who used to jam together during their Kensho days. At Kensho, four people worked on transcription technology to record financial statements for publicly traded companies. This was a difficult task given the poor audio quality, extensive jargon, and mixed accents.
related
Along the way, Schulman and his colleagues fell in love with the untapped potential of AI audio. In AI research, he says: “Voice, in general, lags far behind images and text. We have a lot to learn from the community and how these models work and how they scale.”
That same interest could have led Suno's founders to a very different place. They always intended to eventually develop a music product, but initial brainstorming included ideas for hearing aids and even the possibility of discovering faulty machinery through voice analysis. Instead, their first release was a text-to-speech program called Bark. Surveying early Bark users, it became clear that what they really wanted was a music generator. “So we started doing some early experiments, and they seemed promising,” Shulman says.
Suno uses the same general approach as large language models like ChatGPT, which breaks down human language into discrete segments called tokens, absorbs millions of their usages, styles, and structures, and then reconstructs them on demand. But audio, and especially music, is so almost unfathomably complex that just last year, an AI music expert told Rolling Stone magazine it could be years before services with capabilities on par with Suno's could emerge. “Speech isn't discrete like words,” Schulman says. “It's a wave, isn't it? It's a continuous signal.” High-quality audio typically has a sampling rate of 44khz or 48hz, which means “48,000 tokens per second,” he adds. “That's a big problem, right? So we need to find a way to wrap it up into something more reasonable.” But how? “There's been a lot of work, a lot of heuristics, a lot of other kinds of tricks and models and so on. I don't think we're anywhere near the end yet.” Ultimately, Suno wants to find alternatives to text-music interfaces, adding more advanced, intuitive inputs. One idea he has is to generate songs based on the user's own singing.
OpenAI is facing multiple lawsuits over ChatGPT's use of books, news articles, and other copyrighted material in its vast corpus of training data. Suno's founders decline to detail what data they incorporate into their proprietary models, but their ability to generate convincing human vocals is a powerful addition to music. The researchers did not reveal anything other than the fact that learning from audio recordings was partially responsible. “Speaking naked helps you learn the characteristics of the human voice, which can be difficult,” says Schulman.
One of Suno's early investors was Antonio Rodriguez, a partner at venture capital firm Matrix. The only other music venture Rodriguez had previously invested in was music categorization company EchoNest, which was acquired by Spotify to power its algorithms. In Suno's case, Rodriguez was involved even before it was clear what the product would be. “I backed the team,” says Rodriguez, with the confidence of a man who's made a lot of successful bets. “I knew the team, and I knew Mikey in particular, so I would have backed him on pretty much anything that was legal. That's how creative he is.”
We're trying to get a billion people more into music than they are today. We're not trying to replace artists.
Rodriguez said he invested in Suno knowing full well that music labels and publishers could sue him, which he views as “a risk I had to take when I invested.” . “Because we're the next richest people to be sued after these labels. …To be honest, if we had signed this company to the labels when it started, we probably wouldn't have invested. They could have sold this product without restrictions.” I think it had to be made.” (A spokesperson for Universal Music Group, which has taken a proactive stance on AI, did not respond to a request for comment.)
Suno has been in contact with major labels and has publicly stated that it respects artists and intellectual property. The tool doesn't allow you to request a specific artist's style in the prompts, and doesn't use actual artist voices. Many of Suno's employees are musicians. The office is equipped with a piano and guitar, and the walls are decorated with framed photographs of classical composers. The founders exhibit none of the overt hostility toward the music business that characterized Napster before the lawsuit that killed it, for example. “By the way, that doesn't mean you can't sue,” Rodriguez added. “That means we're not going to act like the police.”
Rodriguez sees Suno as a fundamentally high-functioning, easy-to-use instrument, and believes it has the potential to bring music-making to everyone in the same way that camera phones and Instagram democratized photography. The idea, he says, is to “raise the bar again on how many people are allowed to be creators of content on the internet, rather than consumers of content.” He and the founders dared to suggest that Suno could attract a larger user base than Spotify. Rodriguez says that if the prospect is hard to understand, that's a good thing. It just means that it's “deceptively stupid” in the way that it tends to attract him as an investor. “All of our great companies have a combination of great people and something that just seems stupid until it becomes clear that it's not stupid,” he says.
Well before Suno arrived, musicians, producers, and songwriters were vocally concerned that AI could disrupt their businesses. “Music is created by humans driven by extraordinary circumstances, and those who have suffered and struggled to advance their craft have made the most valuable contribution they have fought to achieve.'' “We will have to contend with the full-scale automation of the arts,” Reed writes. But Suno's founders argue there's little to fear, using the analogy that people still read despite their ability to write. “The way we think about this is we're going to get a billion people more hooked on music than they are today,” Schulman says. “If people become more obsessed with music, more focused on creating, and develop clearer tastes, this is clearly good for artists. It's kind. We're not trying to replace artists.”
Suno is focused solely on reaching music fans who want to make songs for fun, but it can still cause some serious disruption along the way. In the short term, the markets for human creators that appear to be most directly at risk are lucrative areas such as songs created for advertising or television shows. Lukas Keller, founder of management company Milk & Honey, said the market for popular songs remains unaffected. “But for the rest, yeah, it could definitely hurt their business,” he says. “Eventually, I think a lot of advertising agencies, movie studios, networks, etc., won't need to get a license.”
With no hard and fast rules for AI-generated content, there is also the prospect of a world in which users of a Suno-like model will pump millions of robot creations onto streaming services. “Spotify might one day say, 'You can't do that,'” Schulman said, adding that for now Suno users seem interested in texting their songs to a few friends. It pointed out.
Suno currently has about 12 employees, but is planning to expand and is building a larger permanent headquarters on the top floor of the same building as its current temporary office. As we toured the still-unfinished floor, Schulman showed us an area that would become a complete recording studio. But why do you need it, given what Suno can do? “It's pretty much a listening room,” he admits. “We want a good acoustic environment. But we all also enjoy making music without AI.”
trend
So far, Suno's biggest potential competitor seems to be Google's Dream Track, which has licensed a similar prompt-based interface that lets users create their own songs with famous voices like Charlie Puth's. But Dream Track has only been released to a small test user base, and the samples released so far don't sound as impressive as Suno's, despite the famous voice that comes with it. “I don't think making a new Billy Joel song is how people want to engage with music in the future with the help of AI,” Schulman says. “When you think about what you actually want people to do with music in five years, that's something that doesn't exist. It's something that's in their head.”