Skip to main content

Hear Me Out

·681 words·4 mins
Gergely Juhász
Author
Gergely Juhász
Machine Learning Engineer
0:00 / 0:00

Let’s Talk!
#

For no particular reason other than it’s an interesting project - and because I can - I decided to give my blog a voice with Kokoro Text-To-Speech. I would argue that Kokoro sounds better than most used for “Listen to article” TTSes, for example OpenAI’s.

The Audio
#

Although most browsers have a built-in TTS API, I wanted consistency and state-of-the-art quality. Based on web analytics1, most of you visit my blog from your phones. To save battery and ensure smooth experience, the audio is pre-generated and saved in mp3 format.

There are two voices to choose from: Isabella and Michael.

Technical note: For now, this is just a standalone Python script. It strips out most Markdown so that the spoken text makes sense. By default, Kokoro produces uncompressed audio, but the file sizes are quite large. Converting to .mp3 wasn’t trivial but ChatGPT helped me out. Eventually, I might automate this in a CI/CD pipeline, but for now I run it manually, which is no hassle.

If you’re curious, you can find Kokoro on Hugging Face.

It’s still amazing to think about the progress in open-source TTS. Around 10 years ago, I was learning French2 and struggled to remember the numbers and their pronunciation. I experimented with free tools to generate the audio for a memory-card app to help me learn, but the results were horrible. Thankfully, Google Translate already had excellent TTS. With a small bash script, I hit their API endpoint and generated numbers from 0 to 100, problem solved!

In 2025, I can fire up VSCode, import a Python library, and get even better results in seconds.

Custom Audio Player
#

Once I had the audio, the next task was integrating it nicely into my blog. Neither Hugo nor Blowfish had an audio player that fit my needs, so I had to build my own on top of the default one.

I don’t have much free time lately, so instead of dusting off my CSS and JavaScript skills, I went straight to Qwen 2.5 and ChatGPT. The code quality could be better - ideally broken into multiple files - but it works.

Qwen struggled a bit with formatting and understanding my very well-defined requirements (see below), but ChatGPT - unsurprisingly - nailed it. Here’s what I wanted the player to look like:

(>) 0.5x 1x 1.5x 2x                               | Isabella | 
----------O------------------------------------- 01:30 / 05:30

I only had to request a few adjustments for better contrast, and it worked beautifully!

Automation > Ctrl+C, Ctrl+V
#

You may have noticed a few extra icons appread next to my name. Not that anyone asked, but now I’m not only on LinkedIn but also on Bluesky, Mastodon and Threads!

Different platforms have their own approaches: Bluesky provides an RSS feed automatically, Mastodon is free, open-source and decentralised, while Threads hands out account suspensions like free candy3.

I’m not a huge fan of social media and rarely post personal content. It crossed my mind in the past to start sharing on these platforms but I was too lazy. However, I do enjoy experimenting with self-hosted apps - which I plan to cover in future posts - and came across Postiz. I needed a reason to add it to my toolkit, so here we are, I’m on more platforms now!

Postiz demo taken straight from https://github.com/gitroomhq/postiz-app

For now, I’m only using its core feature: scheduling and simultaniously posting to multiple social media platforms. Some of them also provide analytics, such as views and interactions, through their APIs - which Postiz helpfully collects. That’s the cherry on top.


  1. Don’t worry, it’s privacy-respecting tracking; see the Privacy Policy↩︎

  2. The only thing I remember - other than the numbers - is “désolé, je ne parle pas français.” I can say it with a very thick accent! ↩︎

  3. I’m still grumpy about having mine suspended for a day. I’m not sure if it was because of programmatic posting or inactivity after registration, but I’m not alone↩︎