The Great Merge: The Convergence of Text, Audio, and Video
Throughout history, the media landscape has perpetually been shaped by the development of new media types. For example, printed media, which flourished at the end of the 18th century and especially in the 19th, was joined by cinema in the early 20th century, radio in the 1930s, and television from the 50s/60s onwards. Each new entrant disrupted the previous ones without phasing them out, and up until recently, text, audio and image all had their own niches that coexisted in a delicate balance.
When the internet was popularized in the early 2000s, it didn’t immediately create any new media types- internet media was typically a combination of online press, podcasts and videos (and occasionally a mix between the three). However, since I started following publishing trends, I've noticed the emergence of a new paradigm: the gradual abolition of the boundaries between these three universes of text, audio and video.
When text becomes audio
Over the last ten years or so, audio projects for text-based media have become increasingly popular, especially for books. Audiobooks are a fast-growing and increasingly popular segment, making it possible to "read" in the car, while doing housework, or even while exercising. Of course, reading a book aloud is nothing new- this has always been used in church services, for example- but audio books make this method of text consumption accessible to all.
One downside to audiobooks is that their production requires specialized crew and actors. This additional production cost means that not all text-based media receives an audio version (certainly not the daily press and magazines). While there’s been notable growth in "listening" options for certain newspapers over the last couple of years, this option is typically only available for small selections of articles.
Today, AI is radically changing this situation. Switching from text to voice has become extremely easy, but most importantly the quality has improved to an increasingly satisfactory level. AI voices are starting to sound natural and nearly indistinguishable from real humans! Consequently, publishers have started to use AI in their audio productions. The New York Times, for example, now offers all its articles in audio, as does the Dutch daily NRC.
To me, it seems inevitable that text and audio will continue to merge. When text becomes audio, articles become the equivalent of podcasts. In the future, improved AI translation capabilities will allow these media types to become multilingual, offering infinite possibilities for worldwide distribution.
When video and audio become text
I’ve observed that text is being merged with audio, but the reverse is also true: audio can be converted into text using AI. The technology isn't quite there yet, but the results are encouraging- for example, Apple now offers a podcast transcription service, and similar AI tools to transcribe and summarize lectures or long videos are available online for free. It seems that with AI, the possibilities to convert audio and video into text are endless.
When audio becomes video
One noteworthy trend that went unnoticed in 2023: YouTube claimed the top spot of podcasting platforms, ahead of Spotify and Apple. At the same time, YouTube owner Alphabet discontinued its dedicated Google Podcasts platform.
YouTube is a prime example of a video platform increasingly embracing podcasts. Many podcasts on the platform are filmed in studios, resembling TV shows more than traditional audio broadcasts. Similarly, Spotify, which recently expanded its audio offerings to podcast and audiobooks, is now expanding further to include video. As part of this expansion, creators are offered financial incentives to produce these new types of content.
What these developments tell us about the future
This reshuffling of media is nothing new. For the past ten years, streaming has revolutionized the worlds of audio, television and music. Additionally, the amalgamation of audio, text and video is forcing companies to confront this new reality. Each player is penetrating its competitor's turf, while at the same time suffering incursions of its own.
Above all, we’re witnessing a transformation that transcends traditional platform boundaries, whether video, audio, or otherwise. This industry-wide transformation’s implications for the future are still unclear.
What does this mean for publishers?
Up until now, audio has been an opportunity for publishers, and some publishers’ podcasts have made it to the top of the rankings. Monetization, on the other hand, has been more difficult- by 2023, investment in audio content creation came to a standstill.
Now, the new possibilities offered by AI mean that publishers can start producing audio projects at a lower cost, especially as AI tools continue to improve in quality. Amidst an industry-wide trend of declining, text-based media consumption, this could be an opportunity for publishers similar to the introduction of audiobooks in the book industry.
It will also be interesting to see how this affects journalism, particularly in terms of writing. Will audio-based media change the way journalistic stories are composed? Additionally, a similar question can be asked regarding video. In the next few years, will AI open the possibility of building videos from journalistic articles? One thing is certain: the evolving media landscape will continue to challenge and redefine the way stories are told across many formats and platforms.