Opportunities and challenges of Extended Reality (XR)--Yang junchao

2020-10-02

Introduction

In recent years, virtual reality (VR), mixed reality (MR), and augmented reality (AR) have attracted great attention in academia and industry. The industry also refers to these three technologies collectively Extended Reality (XR). Among them, VR provides a revolutionary next-generation immersive entertainment interaction method, while MR and AR have great hopes to enable users to get rid of a single smartphone screen and enhance the user's visual experience. Essentially, VR, MR, and AR are all results of mixing digital content with reality, and the difference lies in the ratio of mixing with reality. Both MR and AR need to be implemented in the real environment around the user. Among them, AR pays more attention to the elements of the real world, while the virtual elements in MR play a leading role. Therefore, AR and MR glasses and wearable devices do not need to completely isolate the external world, but only need to cover the digital content part of the user's current perspective. In order to make the user feel real, AR and MR need to build a 3D model of the environment, and then, integrate the virtual elements and put it in a suitable position to prevent occlusion. In contrast, VR is a 100% virtual simulation experience. It is a virtual world in a three-dimensional space created by computer technology. VR uses a head-mounted display (HMD) to completely cover the user's field of view (FoV), and then, changes in the content played on the screen are fed back to the user's eye and head movements accordingly.

AR based on scene understanding is currently the most widely used and most promising form of AR presentation. The Pokemon Go mobile game launched by Nintendo in 2016 uses AR technology. Players can find the sprites in the real environment through the mobile phone screen, and then capture or fight. Since AR essentially superimposes virtual things on the real world, object recognition and scene understanding play a vital role in AR applications, which are directly related to the sense of reality of the final presentation.

MR relies on the powerful 3D rendering capabilities and interactive perception capabilities of HMD to realize the fusion of holographic images and real life. In medical applications, 3D organ images can directly help doctors communicate with patients and show the location and treatment methods of the lesion. In architectural planning, designers can visualize various design inspirations in the real world, display architectural styles and deployments, and better facilitate commercial cooperation. Currently, the more mature devices for MR applications are Hololens launched by Microsoft and Magic Leap One launched by Magic Leap.

Compared with AR and MR, VR has received more attention and popularization due to its revolutionary interactive methods, applications such as VR sports live broadcast/large-scale event live broadcast, VR travel, real estate video on demand/live broadcast, UGC (user-generated content) VR game on demand/live broadcast, VR conference, and VR live broadcast, have been widely used and promoted. It is predicted that by 2025, the VR application market will reach 30 billion US dollars. Recently, IT companies including Apple, Google, Facebook, and YouTube have also deployed the VR market. As the most popular VR application at present, VR video uses 360-degree video to construct a three-dimensional virtual world in HMD, relying on strong immersion, interaction, imagination, and other features that bring people a revolutionary visual interactive experience. However, this kind of new experience requires higher transmission bandwidth, higher video bitrate, and lower latency.

1. The main challenges currently facing XR 

1.1 Content production 

Since AR is mainly a superposition of virtual elements and the real world, its content production is relatively simple. However, VR and MR content production has higher requirements for shooting equipment and image processing. Therefore, most of the current content production is provided by the professional content production company, which also limits the popularity and development of XR to a certain extent.

VR video content production requires a panoramic camera, that is, a multi-lens camera to shoot image content in various directions and perform image splicing. The number of the lens of panoramic cameras on the market currently ranges from two to a dozen. If you want to generate a binocular panoramic video, at least two lenses should be covered in each direction. Some panoramic cameras are also equipped with depth measurement equipment such as lidar. To generate higher-resolution VR videos, you need higher-resolution cameras or more camera lenses. How to reduce the cost of high-resolution panoramic cameras is a challenge facing the current development of VR. In addition, how to use computer graphics to generate a VR version similar to 3D animation is a huge challenge for VR in the future. Meanwhile, compared with panoramic shooting, its biggest advantage is the convenience of rhythm control and scheduling, especially with various textures, renderings, and light effects, and the visual effect is the same as the real. 

Similar to VR, in MR, a large amount of computer graphics is required to integrate digital images with the real environment, and the system is required to correctly handle the occlusion relationship between virtual and real objects. Currently, MR equipment including Hololens and Magic Leap One is still in the exploration and research stage in holographic image processing and display. How to use the light field to generate digital light that seamlessly integrates with natural light, so that digital images and the real environment can be integrated is always a challenging problem in the MR content production and display stage.

1.2 GPU/CPU processing power

Both VR and MR need to use high-performance GPU to achieve real-time rendering and processing of images. Thus, they have extremely high requirements on GPU. The VIVE of HTC needs to use a high-performance host to achieve image rendering. Similarly, when using Magic Leap One, you need to connect the headset Lightwear and the high-performance portable host Lightpack through a data cable to work together. High-performance GPU/CPU is beneficial for reducing image rendering and processing delay. However, with the continuous improvement of XR content quality and processing delay, GPU/CPU processing power has become the key bottleneck of image rendering and processing delay.

2.Network transmission capacity

2.1 Meeting mobility needs of wireless networks 

The current XR (especially VR) content is mainly local content playback or wired content transmission. On the one hand, due to the limitations of the current HMD hardware rendering capabilities, most VR videos are first rendered on a computer with rendering capabilities; and then, it is transmitted to the HMD through the high definition multimedia interface (HDMI) cable for playback, which limits the application scenarios of VR to a certain extent. In addition, there are mobile VR devices such as Samsung Gear and Google Cardboard on the market that use smartphones for VR video rendering to achieve the wireless effect of HMD, which to some extent solves the scene limitation problem of wired HMD. However, the problem of insufficient rendering processing power and resolution of smartphones also affects the user experience. At present, XR video transmission can only meet the user's basic XR experience, and there is still a huge gap between the industry application requirements for the high-quality XR experience, which undoubtedly limits the development and popularization of XR. Real-time wireless transmission of XR content is the key to meeting the mobility of XR. Particularly, with the continuous popularization and application of 5G networks, how to use 5G networks to help the development of XR is a hot topic in the industry.

2.2 High data rate and low latency requirements

The ultimate goal of VR is that there is no clear-cut boundary can be drawn between the synthesized virtual world and the real world, and the resolution of the VR system is continuously increased to reach the resolution level of the human eye. There are three future 5G application scenarios: enhanced mobile broadband (eMBB), massive machine-type communication (mMTC), ultra-reliable low-latency communications (uRLLC). Besides, wireless virtual reality, mixed reality, and augmented reality, are very special applications, and they have extremely high requirements for large bandwidth, ultra-reliability, and low latency: under the constraint of low latency (the delay MTP of VR requires within 20 milliseconds, while MR's delay requirement is within 5 milliseconds), sending data to the user terminal at Gigabits per second. As it is known, low-latency and high-reliability are two contradictory requirements. Ultra-reliability requires users to allocate more resources to ensure the transmission success rate, but this will increase the delay of other users. Obviously, in order to realize the interconnection of wireless VR/AR, intelligent network design is required to satisfy reliable, low-latency, and seamless support for different network scenarios.

3. Possible technological breakthrough points in the future

3.1 Highly integrated display device (HMD)

The current XR display equipment and image rendering and processing equipment are not completely integrated into one device, but are interconnected through HMDI, which does not meet the requirements of lightweight and mobility. Currently, companies including Facebook and Magic Leap have continuously explored and improved the highly integrated HMD. It is believed that with the continuous development of chip integration technology, wireless network capabilities, environmental perception capabilities, image rendering and processing capabilities, and the display capability will eventually be integrated into the lightweight HMD, thereby, it can provide users with a real-time high-quality XR experience while satisfying mobility.

3.2 Content rendering and production based on computer graphics technology

With the improvement of XR content quality (resolution), the amount of image display calculations has increased exponentially. On the one hand, the field of integrated circuit chips needs to continuously increase computing power while reducing the cost of unit computing power. On the other hand, it also requires continuous improvement and optimization of graphics algorithms in the XR field to reduce the amount of calculation of XR image generation, and provide high-quality XR content based on computer graphics technology while reducing the cost of XR content production.

3.3 XR content transmission based on D2D communication

The demand for high bandwidth of XR undoubtedly puts huge pressure on limited spectrum resources. The use of D2D enables users to directly transmit data, avoid cellular wireless communication, and do not occupy frequency band resources, which greatly improves the utilization of the spectrum. Meanwhile, resource sharing between adjacent users can provide a better user experience. Using D2D communication to improve the efficiency of XR content transmission and ensure users' QoE may be a feasible solution in future XR application scenarios.

3.4 XR content transmission based on the cache transcoding mechanism of 5G mobile-edge computing

Mobile-edge computing (MEC) has been the focus of research in recent years. It makes it possible to offload high-computing tasks for users by deploying closer to users. In the XR video transmission process, due to the limited user terminal power and computing power, offloading high computational tasks such as decoding and rendering to the MEC is a feasible solution to the lightweight problem of XR head-mounted displays. In the meantime, how to use MEC-based content caching and transcoding mechanisms for adaptive transmission to solve the problem of low latency is an important direction for future work. In future applications, scenarios where multiple users request multiple video content, it will be more challenging to use limited MEC storage space and computing power for reasonable content caching and transcoding.

Conclusion 

This article outlines the main challenges faced in the development of XR, and looks forward to possible technological breakthroughs in the future. In short, changes and challenges coexist in XR's future development. With the gradually mature video coding technology, computer graphics technology, and wireless transmission technology, XR application must develop in a direction that is stronger interactivity, richer content, and more convenient use, so that people can have an immersive interactive experience anytime and anywhere, and bring greater convenience and satisfaction to people's production and life.

Yang Chaojun

A doctoral student in Information and Communication Engineering from the School of Communication and Information Engineering of Chongqing University of Posts and Telecommunications, and a Ph.D. jointly trained by the School of Electronic Engineering of the University of Washington. Dr. Yang has long-term research on virtual reality, 5G multimedia transmission optimization, and MEC-based intelligent transcoding optimization. Dr. Yang has published six SCI/EI papers as the first author, 1 Chinese core paper, and 4 patent applications.