Bonum Commune Communitatis: Unifying Machine Learning-Based Video Coding Solutions (How Machine Learning ML is Used in Video Coding Part 6)

At this point in the series, I think we can all agree that machine learning will play a major role in the future of video encoding. Whether it is used as an alternative to the standard coding component or in an end-to-end way, machine learning will certainly be there.

Standardization is critical when it comes to video encoding or any online application as this is a system used all over the world with a device class of hundreds, if not thousands, of different devices. Therefore, the data must follow a strict and standardized format in order to be able to overcome this extreme asymmetry. This is why standardization committees like it Motion Picture Experts Group (MPEG)And the Joint Collaborative Team on Video Coding (JCT-VC), Joint Video Expert Team (JVET)And the Joint Photographic Experts Group (JPEG) exist.

JPEG and MPEG are organized under ISO / IEC JTC 1 / SC 29 (Audio, image, multimedia, and hypermedia information coding). MPEG focuses on defining multimedia encoding standards, such as video and audio compression, file format for applications, and transmission. On the other hand, JPEG focuses on the same aspects as still images. The role of JCT-VC and JVET is slightly different as they were formed to design video coding standards. It is the JCT-VC of High Efficiency Video Coding (I saw) and JVET for Versatile Video Codec (VVC).

These standardization committees have focused on improving the performance of video coding solutions in the past decades. Nowadays, with machine learning being more commonly used in video encoding, standardization committees are beginning to form new groups for these methods.

JPEG-AI became an official working item in 2021. It focuses on providing a learning-based image compression method that targets better visual quality with greater compression efficiency over current image coding standards. Furthermore, machine image coding is also considered for applications such as image processing and computer vision tasks.

MPEG has an open set on neural network compression where efficient transmission of machine learning models will play a major role in the video stream. It is a relatively new group. This has been driven by the growing importance of machine learning-based tools for applications such as video encoding, classification, and description extraction from video content. The first version of Neural Network Compression has already been released in 2021, and version 2 is on the way.

Furthermore, MPEG has an exploration suite for instrumental video encoding. Current video codecs are designed for human consumption. However, today, most videos are parsed by devices, and standard codecs are not a suitable solution for delivering video to devices. The MPEG Activity on Video Machine Coding (VCM) aims to standardize the bitstream format resulting from compression of both the video stream and previously extracted features that will be used in machine vision tasks.

There is, too Image and audio transmission and data encoding by artificial intelligence (MPAI) An organization, independent of MPEG, that aims to develop standards for artificial intelligence-based data encryption.

Artificial Intelligence-Based Mass Video Coding (MPAI-EEV) It is a subset of MPAI focused on end-to-end video encoding using machine learning. The goal here is to develop a method that can compress video size by using ML-based end-to-end data encoding techniques without the limitations of previous video encoding standards. Another MPAI project, Artificial Intelligence Enhanced Video Coding (MPAI-EVC)focused on improving the performance of traditional video codecs by replacing components with machine learning-based methods.

ML-Based Video Coding Standards and Start Date

That was it for this blog post series. We started by introducing what video encoding is and how it is delivered via HTTP Adaptive Streaming. Furthermore, we covered how to use machine learning to improve video encoding performance, the visual quality of decoded videos, and provide comprehensive encoding solutions. And we ended the series with this post by introducing the ongoing standardization work around ML-based video encoding. I hope you enjoyed reading it, and I hope it gave you a little bit about the wide world of video encoding.


Akram Cetinkaya has a bachelor’s degree. in 2018 and MA. in 2019 from Ozyegin University, Istanbul, Turkey. He wrote his master’s degree. A thesis on image noise reduction using deep convolutional networks. He is currently pursuing a Ph.D. degree at the University of Klagenfurt, Austria, and works as a researcher on the ATHENA project. His research interests include deep learning, computer vision, and multimedia networking.


Leave a Comment