Cantai now sings straight from Dorico

Reviews

With the release of Dorico 6.2.20 yesterday, Cantai — the vocal synthesis plugin from composer Richard deCosta and the Turing Opera Workshop — officially landed in Dorico for the first time. The update introduces two key improvements for Cantai users: automatic voice-part detection via a MIDI handshake between Dorico and the plugin, and improved handling of lyrics in repeated sections. Cantai has been available as a web-based renderer and as a plugin for MuseScore Studio for some time; its arrival in Dorico marks a significant expansion of its reach into the notation software world.

But what is it actually like to use? Over a two-week beta period, I put Cantai through its paces in Dorico — working with solo voices, choral sections, and a range of repertoire from hymns to song cycles. What follows is a hands-on guide to getting started, along with an assessment of where Cantai delivers and where it still has work to do.

Background

For as long as notation software has had playback, users writing vocal music have been disappointed with how the voice is rendered.

In 1991, the General MIDI specification identified Choir Aahs and Voice Oohs as part of a standard set of sounds, which have become a familiar nuisance for many notation users. In the wider audio world, Yamaha’s Vocaloid has been around since 2004, synthesizing formants and consonants to create virtual vocal performances with text. More modern vocal plugins like EastWest’s Hollywood Choirs, Emvoice One, and Synthesizer V Studio have made it possible to create high-quality virtual vocal performances across a variety of genres.

Thanks to VST support, these plugins can be used in notation software, as we’ve covered previously on Scoring Notes — in tutorials on EWQL Symphonic Choirs, WordBuilder with Sibelius, and Emvoice One with Dorico. Like big orchestral sample libraries, though, these vocal plugins are designed and built to be used in digital audio workstations (DAWs), and connecting them meaningfully to notation software requires a lot of fiddling, tweaking, trial, and error. NotePerformer from Wallander Instruments brought high-quality, fuss-free orchestral sounds to notation software, and Cantai aims to do the same for vocal sounds, according to its creator Richard deCosta.

Cantai and NotePerformer VST windows in Dorico Play mode
Cantai and NotePerformer make an excellent pair.

Simple setup

Using Cantai in Dorico is very straightforward, though slightly more manual than NotePerformer. Once installed, head over to Dorico’s Play mode. In the VST and MIDI tab, add one instance of Cantai for each vocal or choral part. Click the plus sign, and in the dropdown, select Turing Opera Workshop > Synth > Cantai. While it’s not required, you’ll do yourself a favor if you take an extra moment to click the gear icon (Endpoint Setup) and name each instance something meaningful, like the voice part or character name (e.g., “Soprano 2” or “Dr. Pencilpusher”). This first step loads the plugin inside your project.


With the Cantai instances loaded, you can assign each one to a different singer in the score. Switch to the Track Inspector tab (still in Play mode), and select one of your voice tracks. In the dropdown menu under Routing, you should see each instance of Cantai you’ve added (see why I recommended naming them?). Repeat this for each vocal or choral part. It’s important to note that different parts cannot share an instance of Cantai, even if they’re the same voice type (e.g., Tenor 1 and Tenor 2). Cantai is a monophonic plugin, unlike most of the other instrumental plugins you may be used to.

The last step in setting up your score is to select which of Cantai’s voices you want to use for each part. Still in the Track Inspector, click the e icon (Edit Instrument) to open Cantai’s plugin window. From there, you can select a solo voice or choral section. The solo voices are all distinct, with some being a bit brighter or darker, or carrying a bit more or less vibrato. The Cantai website has a page with rough descriptions of a few of the voices, but it leaves off most of the more recently added ones, so the user is left to experiment — for better or worse.

Three instances of Cantai plugin window showing list of voices
The window is quite small and requires a lot of scrolling, but there are many options for virtual singers and ensembles here.

After selecting a voice, you’ll see a “Rendering…” message in the bottom left corner of the plugin window. While this is happening, Cantai is rendering a hidden audio file of that voice part in the background, which will be synchronized with other instruments on playback. You can close the window and the plugin will continue rendering silently.

Once the rendering is complete, Cantai will sing your vocal music when you play back your score. Because of how Cantai works (more on that below), you won’t hear vocal sounds while you write or select a note, as you might with instrumental plugins. Instead, you’ll hear a simple sine tone as you navigate your score. While I understand the limitations here, I think a sine tone may be a poor choice — it sounds very soft compared to other virtual instruments, leading to a lot of volume-adjusting between writing and playing back.

How it works

Cantai is able to work in Dorico because it can read the entire score ahead of time. As Scoring Notes readers may be aware, NotePerformer uses a one-second look-ahead during playback, giving the orchestral playback plugin enough context to make thoughtful decisions about shaping phrases. Vocalizing text requires even more context, though. The syllables of a single word can be spread across many notes, and sometimes a single word will have a different pronunciation and meaning depending on its context in the sentence. (Fun fact: this is called a heteronym.) For example: “The wholesome Polish gentleman lives to hear live music with a fresh polish on his shoes.” Cantai would need the whole sentence to determine how to pronounce “live” and “polish.”

When you add a Cantai voice to a staff in your score, it immediately begins rendering in the background on your computer. This happens silently and can take a few seconds for a short file; for longer files on a slower computer, it may take a bit longer. Even on somewhat longer files — such as an eight-flow song cycle — my M2 MacBook Pro rarely needed more than 20 seconds when rendering a new voice. That’s not terribly long considering everything happening in those 20 seconds, but if you’re interested in trying out several different voice options, be prepared to sit and wait a bit. Richard tells us that a future update will add an in-menu preview, “so when you hover your mouse over a singer, it will play an audio sample of that singer — so you don’t have to render the entire part to hear what they sound like.” That will be a welcome addition when it arrives!

Once Cantai audio is rendered, the prepared audio file plays back synchronized with whatever other VSTs you may be using for instrumental playback. I am slightly concerned about the disk space these uncompressed audio files may occupy on the computer’s primary volume. I’ve only had access to Cantai for Dorico during a two-week beta period, so I can imagine it adding up over months and years. It is worth noting that Cantai’s settings panel (click the ellipsis button in the top right) includes an option to clear the cache manually if space becomes a concern.

Ethically trained voice models

There are many justifiable concerns any time a tool that relies on AI or machine learning touches the creative process in this way. This is something Richard has carefully considered in designing and developing Cantai.

Cantai’s voices are all trained specifically for this purpose, and all of the singers who contributed were paid for their recording time. Additionally, Cantai singers earn a royalty in proportion to how much Cantai users add their voices to projects. As Richard told us, these singers are “lending their voices to something that is beyond their physical limitations. They can be in a thousand recording studios at once now, as opposed to just one. So they should get paid for every time they go into the studio, whether they’re there or not — because it’s their voice. It’s their property.”

In addition to their own voice models, Cantai uses technology from OpenAI called Whisper in its voice-training pipeline. Whisper is a widely used speech recognition model, and Cantai makes use of it to move between lyric text and the phonemes and formants recorded by their vocalists.

A few caveats

While I find Cantai to be a very impressive engineering feat and an incredibly efficient tool, there are some caveats worth noting.

For now — a clause that’s probably applicable to most of my gripes — Cantai’s language support is limited to English and Latin. The English is mostly right. You’ll note that in the context of the “Alma Mater” hymn, which is mostly in English, the pronunciation of “Alma” is a bit off: the first vowel is closer to the a in “cat” than in “balm.”

Having said that, Cantai does a passable job with strange proper nouns like “Wichita.” The language model that Cantai uses also supports Spanish, Italian, French, and Japanese, but they’re not yet built into the product. Richard has stated publicly that these and other languages — including German — are in the pipeline. It’s also worth noting that you can’t tell Cantai what language you’re using, so a random Latin word in an otherwise English lyric might still trip it up from time to time. There’s currently no mechanism for correcting Cantai’s pronunciation other than rewriting the lyric phonetically in the score.

Speaking of which, there are almost no controls in the Cantai plugin. This is both a feature and a limitation — it does mean that things like the amount of vibrato, the degree of sliding between notes, and other performance nuances are entirely up to the recorded voices and the computer model. If one voice doesn’t give you what you’re looking for, you are more or less out of luck.

Cantai settings window
The voices slider, clear cache, and reset buttons are the only control offered by the Cantai plugin.

One of the few direct settings available is the number of voices in the choir sections (up to ten singers). Selecting more voices makes the rendering take significantly longer — even in a short 16-measure choral hymn, I waited several minutes while Cantai rendered ten singers for each of my four parts. In practice, I would probably stick with solo voices while actively working on a piece and only switch to choral sections near the end. Also, because the choral sections are built from the same recorded voices as the solo options, the sound is much more like an opera chorus than what I usually expect of a choir. Choral singing and classical solo singing are very different sounds, and I wish I could choose a tone that is more choral in character and articulation. And while on the topic of choirs: I do wish the choir sections were labeled by voice range (e.g., “Treble Chorus”) rather than by gender (“Female Chorus”).

Worth reiterating in the context of choirs: Cantai is monophonic. If you are writing for choir in a condensed score (e.g., soprano-alto staff and tenor-bass staff), Cantai will be confused and produce some very strange results! To handle a four-part choir, use an open score for now. Even then, any momentary divisi will trip up the renderer. Again, this is something Cantai hopes to address soon.

Beyond classical music, there are very limited options among Cantai’s voices. A few lean slightly toward musical theatre to my ear, but they are all classically informed. Again, Cantai plans to address this, and Richard identified it as one of his higher priorities. In our podcast interview this month, he said that he was actively seeking new vocalists for less-classically-inflected Cantai voices.

Cantai doesn’t cope well when it encounters the unexpected — even relatively common things like spoken passages. If Cantai encounters a notation it can’t parse, it simply renders nothing. One of the scores I tested had just two syllables of spoken text within a 20-minute song cycle, and Cantai didn’t sing any of the rest of the piece. Richard tells us that support for these kinds of notations is on his to-do list: “I’m working on a project now that has Sprechgesang, and Cantai doesn’t do that yet. And it will, because I really want it to be in there.”

One last thing I wish Cantai handled better is dynamics and articulations. In some of the scores I tested, very stark articulation contrasts were not always consistently rendered by the Cantai voices. And while Cantai does follow dynamic markings as changes in volume, it doesn’t vary timbre or diction in the way I would expect to hear the difference between pianissimo and fortissimo.

Some of these limitations are nitpicks, and nearly all of them are on Cantai’s development roadmap. They’re not dealbreakers for me, but for many composers and arrangers they would be.

Conclusions

If you’ve read this far, you’re probably thinking that this all sounds neat, but is it good? Does it actually sound good?

After a lot of experimenting with Cantai, I can tell you that it really depends. One of the challenges with many machine-learning implementations is that they can be hard to predict and harder to control. In some circumstances, Cantai sounds wonderful. In others, it sounds a bit off — and not in a good way. Certain kinds of melodic lines (large or awkward leaps) in certain Cantai voices can produce audio artifacts, and some notes in faster melismas get “ghosted” so they’re barely heard at all. Even short of those extremes, there is sometimes an “uncanny valley” quality to the rendered audio.

That said, Cantai’s rendering is improving all the time — dramatically over the past year, and noticeably even during the Dorico plugin beta. It’s safe to say the plugin will sound better in a few months, and better still in a year.

It’s also important to consider the alternatives. I began this article by describing the generic “Ooh” and “Aah” voices we’re probably used to hearing from notation software — many of us have probably changed choir playback to strings or clarinets just to avoid them. Setting up Cantai isn’t much more involved than making that switch.

At the other end of the spectrum are DAW-focused plugins like Synthesizer V and EastWest Hollywood Choirs, which can produce excellent virtual vocals. Using these in Dorico is possible, but requires considerable manual work, and there is a limit to what you can accomplish without bouncing to your DAW. Check out John Barron’s Discover Dorico livestream on Synthesizer V to get a sense of how deep this rabbit hole goes.

I don’t think Cantai — at least as it stands now — is going to replace these more powerful vocal plugins, just as NotePerformer hasn’t had a measurable impact on Vienna Symphonic Library sales. To dismiss Cantai on those grounds is to miss its biggest virtue: set it and forget it. The value is in the very high ratio of quality to effort.

It’s that quality-to-effort ratio that makes Cantai something I expect will become an everyday staple for many composers and arrangers.

Pricing and availability

Cantai for Dorico is available now for $299 as a one-time purchase, which includes all voices yet to come. Educators — including private teachers — are eligible for a discounted rate of $99, and current students can access Cantai for free while enrolled. This is a generous offer that I expect many of my university students will take advantage of.

Cantai for MuseScore Studio is currently available through MuseHub for $14.99 per month, and is included in the $20 per month (or $200 per year) MuseSounds Pro bundle.

Cantai for Sibelius will enter beta soon, with release planned for May 30. Pricing is the same as for Dorico; during the beta preorder period, users can purchase it at a discounted rate of $150.

Richard tells us that “whenever there’s a new voice or an ensemble, it will be added to Cantai at no additional cost to the user” — for all three versions.


Want to hear more from Richard deCosta about the story behind Cantai, the technology that makes it work, and his long-term vision for vocal synthesis in notation software? Philip and I sat down with him for a wide-ranging conversation — including the ethical model he’s built around the singers who power Cantai, what’s on his very long to-do list, and what comes after notation software entirely. The interview is available on the Scoring Notes podcast.

Leave a Comment

Your email address will not be published. Required fields are marked *