SpeechFlow is a self-supervised generative model pre-trained with unlabeled speech. It shares similar architecture and objective function with Voicebox, and can be fine-tuned for different tasks at a significantly lower cost. Read paper for more details
With 62.5x less labeled data, SpeechFlow can be fine-tuned to perform zero-shot TTS at Voicebox-level.
Thus did this humane and right minded father comfort his unhappy daughter and her mother embracing her again did all she could to soothe her feelings
They moved thereafter cautiously about the hut groping before and about them to find something to show that warrenton had fulfilled his mission
And lay me down in thy cold bed and leave my shining lot
And the whole night the tree stood still and in deep thought
Instead of shoes the old man wore boots with turnover tops and his blue coat had wide cuffs of gold braid
The army found the people in poverty and left them in comparative wealth
Yea his honourable worship is within but he hath a godly minister or two with him and likewise a leech
He was in deep converse with the clerk and entered the hall holding him by the arm
Number ten fresh nelly is waiting on you good night husband
SpeechFlow can also be fine-tuned to separate overlapped speech.
Fine-tuning SpeechFlow to remove noise in speech.