Podcast, YouTube to text

Let's start with the easy problem: transcribing a YouTube video to text, using YouTube and Chrome. You can adapt to other browsers without too much trouble.

Then the slightly harder problem: transcribing something that's not already on YouTube. That's easy. Upload to YouTube, wait for it to get translated by Google, and that reduces it to an already solved problem.

Source of techniqye: this page.

Three steps:

1. Turn on captions.
2. Grab the text.
3. Clean it up.

Turn on captions:
Turn on "Subtitles/Closed Captions" for the video. If the original provider did not provide subtitles or captions, Google will have turned its servers loose on the problem.

Grab the text:
As soon as the subtitles start displaying, open Chrome Developer Tools, by pressing F12.

At the bottom of the developer tools is a console pane. If it's not there, press ESC.

Copypasta this into the console:
if(yt.config_.TTS_URL.length) window.location.href=yt.config_.TTS_URL+"&kind=asr&fmt=srv1&lang=en"
That will change the YouTube page into a bunch of text. Your Chrome tab might look like this:


You can see the copy/paste in the console, at the bottom right, and HTML for the transcription on the right.

Now put your cursor in that tab and select all the text (Ctrl-A) and copy it (Ctrl-C). Great. Now we've got a translation with a bunch of HTML tags intermixed.

Clean it up
Now you can paste this into your editor, and remove all the stuff that doesn't count.

Or, if you are technical, you can go to this site, and run the app, then past it in. The app will remove all the tags for you.

And, bonus, if you type in a space, comma, or period at the end of the line, it will add the punctuation, and merge it with the line following. What can be better?

If I get in the mood I will improve, and host somewhere more people friendly.

Until then, that's whatcha got.



Comments

Popular Posts