That was my adventure running DeepSpeech on Windows. I do need to write a better sample snippet and explain it in a future blog post, because taking one giant file and waiting for it to complete is not really sustainable, but it works for now. With the GPU (NVIDIA GeForce RTX 2080 SUPER) library - 27 minutes. With the CPU (AMD Ryzen 9 3900XT 12-Core Processor) library, it took about 24 minutes to generate the transcript. Using DeepSpeechClient using System using System.IO using namespace ds_dotnet When I fed my WAV file through DeepSpeech, as follows: Make sure that the right version of CUDA and the associated CuDNN are installed.Įasy, right? Or so I thought.Install the deepspeech-gpu package (if you don’t have a beefy GPU, no worries - just use deepspeech).It all starts pretty trivially, as outlined in the official instructions: My primary machine is no longer UNIX-based, so I had a personal interest in getting it working properly - I could finally put my RTX 2080 to good use. As I started exploring the library, I realized that it had Windows builds, but no concrete instructions on how to get things running on the OS. More than that, it comes with a pre-trained English speech model that you can start using right away. One of the choices for STT might be DeepSpeech - a library developed by Mozilla that does just that. I have a podcast, that I want to transcribe and generate captions for, and I wanted to do that blazingly fast. You might have many reason to do speech-to-text (STT) transformations locally - privacy, you have custom-trained models, or maybe you just don’t need the latency that comes with online services.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |