Immersive Audio Still A Work In Progress For NextGen TV
Just as the ATSC 3.0 or “NextGen TV” standard gives broadcasters new video options over ATSC 1.0 (including the ability to deliver 4K UHD with High Dynamic Range (HDR) to TV sets or reliably transmit lower-resolution video such as 480p to mobile devices), it also enables a host of new audio features including immersive sound and personalized audio tracks. But just like the video side of the standard, broadcasters are still in the early days of exploiting 3.0’s audio capabilities.
To that point, none of the roughly 250 3.0 stations broadcasting across nearly 60 U.S. markets today are using the Dolby AC-4 audio technology included in the standard to deliver Dolby Atmos 5.1.4 audio, which offers additional overhead audio channels to create an immersive sound experience that makes viewers feel like they’re part of the action. Most are simply passing through the Dolby 5.1 surround sound from the Dolby AC-3 encoder used in their ATSC 1.0 broadcasts.
That includes Sinclair’s 3.0 stations, which are offering an improved picture today via 1080p video with “Advanced HDR by Technicolor” technology, which applies special processing to standard dynamic range (SDR) content to create HDR. Mark Aitken, Sinclair SVP of advanced technology, isn’t sure that immersive audio will ever be that important for local stations.
“I’m not sure there’s a big play for broadcasters in that immersive audio environment,” said Aitken. “Certainly, it’s not going to be used in local broadcast stations. News is almost entirely ‘center-stage’ with a personality, and so even stereo gets lost in that mix.”
Nor are any 3.0 stations currently using AC-4’s new object-based audio capabilities to deliver personalized audio streams to a TV set, such as a “home announcer” commentary feed for a sports broadcast, though a few stations have experimented with multiple language feeds.
Most of 3.0’s audio delay in the U.S. is attributable to the same factors that have held back the rollout of UHD content: there is still a small base of NextGen TV-capable sets; the distribution chain from networks to local stations is not fully ready for new formats; and broadcasters are somewhat hamstrung in experimenting with their 3.0 content because of FCC rules that require them to effectively simulcast their 1.0 programming. Rights considerations for high-value content like professional sports may also preclude 3.0 personalization features like alternative commentary for some time.
Some major U.S. broadcasters have produced live sports in immersive audio in conjunction with their UHD and HDR efforts, but distribution has been limited to special pay TV channels or streaming platforms. Industry insiders say the rest of the world is far ahead of North America in implementing immersive audio, whether it is Dolby Atmos or the competing MPEG-H standard developed by Germany’s Fraunhofer Institute.
“Europe is leading the charge here,” said Larry Schindel, senior product manager for audio processing vendor Telos Alliance, which makes AC-4 encoders for broadcast applications as well as up- and down-mixers used to blend stereo, surround (Dolby 5.1) and immersive audio in live production.
NBC has been the leader among U.S. broadcasters in pursuing immersive audio, said Schindell, producing a number of Notre Dame college football games in Dolby Atmos as well as providing the 2021 Tokyo Olympics in 4K HDR and Dolby Atmos to Comcast subscribers. Turner and ESPN have also experimented with some Atmos productions, for basketball and college football, respectively.
Terrestrial U.S. broadcasters can’t deliver immersive audio without going to 3.0 and AC-4. But many cable and satellite operators as well as streaming platforms like Netflix can deliver Dolby Atmos by using Dolby Digital Plus JOC (Joint Object Coding), a process by which Dolby Digital Plus with Atmos decoders receive a legacy 5.1 mix and sideband metadata and use them to reconstruct the original Atmos mix.
“That last mile has proven to be a bit of a bottleneck, but it is getting better for sure,” said Schindell.
An Early Winner
One new AC-4 capability that early 3.0 stations are taking full advantage of is “Voice Plus,” a dialogue enhancement technology developed by Dolby Laboratories specifically to address problems viewers have hearing dialogue amid music and other sounds in entertainment programming. Voice Plus was one of the most popular 3.0 features with consumers in early focus groups conducted by broadcast coalition Pearl TV in Phoenix, Ariz.
“The sound is so amazing today that’s being produced, but it’s harder and harder to hear the actors’ voices,” said Pearl TV managing director Anne Schelle.
Dolby promoted the dialogue enhancement technology in a series of commercial spots in 2021. Those same spots are running again this year in 25 NextGen TV markets reaching 30 million households, Schelle said.
AC-4’s ability to enhance dialogue in legacy content was one of the key reasons it was selected by North American broadcasters for the 3.0 standard, said Mathias Bendull, Dolby’s head of consumer audio playback, broadcast. He noted that unlike immersive audio or personalization, Voice Plus doesn’t demand any extra work by broadcasters.
“It doesn’t require any additional implementation or any additional content,” Bendull said. “It is just the way that the Dolby AC-4 encoder is configured that it delivers a signal that is capable in the AC-4 television sets to create this dialogue-enhanced version of the content that is processed through the TV station. That is a feature that consumers have very positively responded to.”
Another feature that is built into AC-4 is the ability to support high-quality audio description for the visually impaired. While the AC-3 codec used in 1.0 delivers audio description in mono or stereo audio, even if the program is in 5.1, AC-4 matches the quality of the audio description to the program’s overall audio quality whether it is stereo, surround or Atmos. Cable operators like Comcast that have tested AC-4’s audio description feature see that a major accessibility benefit of 3.0, Bendull said.
Competing Standards
Looking globally, adoption of Dolby Atmos and MPEG-H varies based on geographic region. And makers of professional audio equipment like Lawo currently need to support both, as do many consumer set manufacturers.
“There are some parts of the world that have chosen Atmos, and others are favoring MPEG-H,” said Christian Struck, senior product manager of audio for Lawo. “Brazil, China and Korea have standardized on MPEG-H, while most parts of Europe and North America are on the Atmos track.”
Capabilities of the two standards for things like personalization and different language tracks are pretty similar, and Struck is hopeful that the industry will eventually standardize on one flavor of immersive and next-generation audio metadata.
Overall, broadcasters’ interest in immersive audio is picking up now that the COVID-19 pandemic situation has improved, Struck said. When COVID-19 first hit in 2020, he said, “broadcasters had to focus on staying on air, and so immersive audio was put on a backburner.”
Lawo started working on immersive audio back in 2011, doing special developments on 22.2-channel sound for the Japanese “Ultra-HD” consortium, and contributing to some early immersive trials during the 2014 Winter Games in Russia and the 2014 World Cup in Brazil. At that time broadcasters were starting to debate whether viewers would embrace the aspect of audio personalization or rather preferred having “the best seat at the venue without leaving their homes,” Struck said.
While some broadcasters thought features like different languages or commentators would be the most compelling to viewers, the first step focused on providing the “kick that you get from a three-dimensional sonic image,” he said.
Since then supporting the development of immersive audio in live production has been a “continuous process” for Lawo, said Struck, who noted that most immersive audio activity to date has been in Europe and Asia. BT Sport in the U.K. began broadcasting Premiere League professional soccer in 4K and Dolby Atmos back in 2017. The Bundesliga soccer league in Germany has also adopted immersive audio for top matches broadcast on Sky, using a dedicated immersive audio control room in central Germany that remotely controls the audio processing equipment at the game venues.
There are two sides to immersive audio, Struck explained, the capturing side and the reproduction side. Capturing essentially involves more microphones and a tracking-based automated mixing system like Lawo’s KICK for compelling field-of-play noises. A suspended special microphone array, or “tree,” captures the crowd, announcements, etc., to provide the enveloping “sound bed,” while the other signals are arranged in the sound field based on other considerations.
The mixing console then transmits separate submixes, called “stems”, to the immersive processor for encoding. This approach is often channel-based, i.e., any setting change by the user affects one of the 5.1.4 (or more) channels. Next-generation audio, on the other hand, relies on “sound objects” whose levels can be set by viewers at home for a personalized experience, with less or no commentary or a louder or softer crowd.
In the end the bigger challenge is on the reproduction side, which requires the appropriate data to be sent through the broadcast chain to a consumer receiver connected to an audio system that can receive and decode the information and then accurately deliver the immersive effect, such as through the use of extra speakers mounted in the ceiling.
One of the reasons that personalization initially rated higher as a new audio capability was that it was something that could be supported by conventional stereo speakers and didn’t require new equipment in the living room. But with more people listening on binaural headphones, the development of soundbars, and directional speakers that use controlled sound reflections from the ceiling and walls to create an immersive effect, a big technical hurdle has been lifted, Struck said.
“All big manufacturers of speakers have very good sounding soundbars for comparatively little money, and it kind of becomes the new standard,” he said. “If you buy a home cinema system you immediately have up-firing speakers or a soundbar that is able to reproduce three-dimensional sound. So that has helped a lot. And people really feel the difference.”
Working in immersive audio gives broadcasters more freedom to place audio sources that in a conventional stereo or surround mix would probably wind up as front or center channels. There are also new considerations, such as the need to add a layer or two of downmixing to validate surround and/or stereo mixes for viewers who don’t have immersive home audio systems. This means more monitoring work for audio engineers who also need to check their metadata encoding to make sure it can be properly decoded on the receiver side.
Struck is hopeful to see a significant increase in Dolby Atmos coverage to complement 4K picture quality globally, but says it is still early days with immersive audio for most U.S. broadcasters.
“Immersive is a topic in pretty much every conversation, but it’s not the leading or driving factor,” Struck said. “Broadcasters currently focus on remote production, ST2110-based infrastructure utilization, and the flexibility provided by distributed architectures. But I’m confident immersive audio will take off in the near future.”
The amount of live sports being produced in immersive audio is certainly growing; Bendull said a dozen international broadcasters are using Dolby Atmos for their coverage of the 2022 FIFA World Cup. Dolby Atmos is also being delivered with AC-4 in over-the-air broadcasts in Poland today, and Dolby is currently working with some partners in the U.S. to demonstrate Dolby Atmos at a local station in the near future, probably early 2023.
“There is content in much higher quality available than what makes it through the pipe,” Bendull said. “So we are doing some work to demonstrate what terrestrial ATSC 3.0 experienced with immersive audio will sound like.”
Comments (0)