Improving WaveRNN with Heuristic Dynamic Blending for Fast and High-Quality GPU Vocoding

Auto-regressive vocoders are typically less efficient at inference due to their serial nature, making it difficult to fully utilize graphics processing units (GPUs). In this context, batched inference with upsampled feature folding can be used to speed up vocoding. However, speech quality degradation caused by blending folded waveform segments making it hard to be applied to production. To address this issue, we propose a novel blending approach called heuristic dynamic blending (HDB), which effectively addresses the voice trembling and diplopia issues of conventional static blending. We also propose a parallel algorithm of HDB running on GPUs, which significantly reduces the additional time overhead introduced by the naive HDB algorithm. Experimental results demonstrate that using a multi-band WaveRNN with HDB can effectively improve parallelism for real-time GPU vocoding while maintaining high speech quality comparable to non-folding inference.

Audio Samples

Dataset: The LJ Speech Dataset - Keith Ito


Non-Folding: Non-Folding inference.
Folding wo Blending: Folding inference without blending. Crackling Noise.
Folding w Static Blending: Folding inference with Static Blending. Voice Trembling and Echo.
Folding w HDB: Folding inference with the proposed Heuristic Dynamic Blending.


Audio Samples of \(\hat{SL}=100\), \(\hat{OL}=50\)
Name Ground Truth Non-Folding Folding wo Blending Folding w Static Blending Folding w HDB
LJ009-0110
LJ009-0207
LJ010-0138
LJ015-0159
LJ016-0326
LJ033-0003
LJ039-0184
LJ043-0080
Audio Samples of \(\hat{SL}=500\), \(\hat{OL}=50\)
Name Ground Truth Non-Folding Folding wo Blending Folding w Static Blending Folding w HDB
LJ009-0110
LJ009-0207
LJ010-0138
LJ015-0159
LJ016-0326
LJ033-0003
LJ039-0184
LJ043-0080
Audio Samples of \(\hat{SL}=1000\), \(\hat{OL}=50\)
Name Ground Truth Non-Folding Folding wo Blending Folding w Static Blending Folding w HDB
LJ009-0110
LJ009-0207
LJ010-0138
LJ015-0159
LJ016-0326
LJ033-0003
LJ039-0184
LJ043-0080