Generate subtitles for your videos with Whisper AI

Lately, everything with the word AI is in fashion and there are some very interesting articles, others repeated in hundreds of blogs and many others that are pure clickbait.

But among what is being done lately, there are real wonders and today I want to tell you about the workflow that I have incorporated into the creation of the course I’m doing, WordPress Ninja Developer.

I’m going to leave aside the whole process of scripting, shooting and video editing. Here we start from the final video that I upload to the streaming platform.

Video to audio

In this process that I am describing, you can work directly with the videos, but as I am going to take advantage of cloud processing, it is not the same to upload a 157 MB video than its corresponding audio of 3.68 MB, so first let’s extract the audio from the videos.

There are many ways to perform this conversion and to do it in batch with all our videos, the easiest is from VLC graphically or from the console with FFmpeg.

From the VLC program we go to the menu, Convert submenu (Ctrl + R or Cmd + R on Mac) and from there we select all our videos to convert, then click on Convert/Save, select the Audio – MP3 profile and soon we will have all our videos converted to MP3, with the same name as the original and in the same folder.

The other option is from the command line, we are going to use WSL with Ubuntu.

The essential requirement is to have FFmpeg installed, which we do with:

sudo apt-get update

sudo apt-get install ffmpeg

And check that it is installed correctly with ffmpeg -version.

Then we enter the folder where the videos are in mp4 format, if they are in another format, we will have to adapt our command, and we execute:

for i in *.mp4; do ffmpeg -i "$i" -vn -ab 128k -ar 44100 -y "${i%.mp4}.mp3"; done

This command uses a loop for to iterate over each MP4 file in the folder and executes a FFmpeg command to convert it to MP3. The arguments -vn, -ab 128k and -ar 44100 are used to specify that only audio data should be converted, that the audio quality should be set to 128 kbps and that the audio sample rate should be 44100 Hz.

Once the conversion of all video files to MP3 is completed, you will find the MP3 audio files in the same source folder and with the same names (we can change them by modifying the above command).

We can also customize FFmpeg arguments according to specific needs.

Create subtitles

And here comes the most interesting part, where we use Google’s AI and servers, all for free.

For this task we will use the Whisper AI that they have released as Open Source. But since these AI operations require a lot of GPU processing, it is best to use the Google machines that Google provides for these tasks for free through Google Colab.

For this task I have created a Python script from Google Colab that takes care of installing Whisper, mount our Google Drive, for which it will require the corresponding permissions.

In the next step you will create the folders where you will copy the subtitle files and translations. Here we must select if we want to transcribe and translate or one of both options, the Whisper model to use and if our video is in Spanish, English or if it auto-detects the language of the video.

In the third step it already takes care of transcribing and translating our audio file extracted from the video.

The Colab that you can copy and if you want to modify to your liking, is in https://colab.research.google.com/drive/1uSaLKvXwvhUS2hEi8_OFU-Ymq97aRfe1?usp=share_link

As a last step, before uploading the subtitle files to the final video, I check it with LanguageTool from Visual Studio Code and the LanguageTool desktop application as you can see at the end of the following explanatory video: