I.

Introduction to extended reality: AR, VR and MR

The history and evolution of extended reality

Throughout history, human beings have always looked for visual ways to express their imagination, creativity and desire to go beyond the physical world. The goal is to represent scenes, moments and experiences that allow others to experience them with all of their senses, offering the opportunity to realise dreams, ambitions and visions – or even to live in imaginary worlds.

With the support of technology, we can have more real and concrete experiences with total immersion for our senses. This is possible through the virtualisation and augmentation of our realities, or by combining both in a mixed environment. In this chapter we’re going to discuss extended reality (XR), which covers virtual reality (VR), augmented reality (AR) and mixed reality (MR). Before getting into definitions, let’s start with a brief history.

From the Great Hall of the Bulls to the Stereoscope and the Sensorama machine

There are many examples in history where we can find the desire to represent and experience things visually. We can start our journey in prehistoric times and go back about 15,000 years ago to Lascaux (a cave located in what is now France). Then, humans created about 600 wall paintings of large animals in what is known as the “Great Hall of the Bulls”. This is one of the first surviving representations of a human being's ability to project realities beyond his individual experience into a given fragment of time – thus allowing others to experience realities through immersion in images generated by others.

We can find many other examples throughout history, but a key moment in utilising technology to create a device specifically meant to immerse the user in a virtual experience was Charles Wheatstone's invention of the stereoscope in 1838. The stereoscope allowed users to see two separate images for each eye, creating the feeling of a unique 3D image. This was a breakthrough in creating a portable and personalised experience of virtual reality. Many people today will remember a modified version of the stereoscope called the View-Master, as a childhood toy.

A View-Master
A View-Master

In this line, it's also important to mention the Sensorama machine. This device was one of the first machines with immersive multisensory technology. Created in 1962 by Morton Heilig, the Sensorama projected images in stereoscopic 3D format, combining stereo sound, a moving chair to incline the body, wind and aromas. This was the first VR system.

Moving forward to the early 1990's, the CAVE system (Cave Automatic Virtual Environment) was developed at the University of Illinois. In this immersive environment, multiple projectors are aimed at walls in a room-sized area, in which the user wears 3D glasses to experience virtual reality. This leap in simulated environments is used to this day by product engineers, flight simulators, construction planning and more.

Today, developmental leaps in the field of AR/VR/MR are moving towards more portable, realistic, personalised, and cost-effective means of enabling virtual reality to be a ubiquitous tool for a wide variety of industrial and personal uses. In the future, these tools and technologies will be as widespread as personal computers and other smart devices today. Soon, it will be difficult to imagine our world before virtual reality.

Defining AR/VR/MR

Augmented reality

According to the Swiss Society of Virtual and Augmented Reality (SSVAR), “Augmented reality (AR) overlays digitally created content onto the user’s real-world environment. AR experiences can range from informational text overlaid on objects or locations to interactive photorealistic virtual objects. AR differs from mixed reality in that AR objects (e.g., graphics, sounds) are superimposed on, and not integrated into, the user’s environment.”

To better understand the concept of augmented reality you just need to remember Pokémon GO. The game leads us to search for and capture "digital" creatures (not belonging to our physical world) that are added as layers (holograms) on top of the real world. Or take the Iron Man movies, for example, where the analogue world is enhanced with digital interfaces.

Virtual reality

“Virtual reality (VR) is a fully immersive user environment affecting or altering the sensory input(s) (e.g., sight, sound, touch, and smell) that can allow interaction with those sensory inputs based on the user’s engagement with the virtual world. Typically, but not exclusively, the interaction is via a head-mounted display, use of spatial or other audio, and/or motion controllers (with or without tactile input or feedback).” -SSVAR, 2021

To better understand what virtual reality is, we can use the film The Matrix as an example. These are contexts in which we are transported to virtual digital worlds, leaving our analogue (physical) world behind.

But how are these two technologies positioned regarding each other and the real vs digital world? As we can see, virtual reality is supported by a total computer-generated environment. AR sits in-between the digital computer-generated world and the real world.

Mixed reality

“Mixed reality (MR) seamlessly blends a user’s real-world environment with digitally created content, where both environments coexist to create a hybrid experience. In MR, the virtual objects behave in all aspects as if they are present in the real world – e.g., they are occluded by physical objects, their lighting is consistent with the actual light sources in the environment, they sound as though they are in the same space as the user. As the user interacts with the real and virtual objects, the virtual objects will reflect the changes in the environment as would any real object in the same space.” -SSVAR, 2021

Sometimes MR could be confused with AR and vice-versa as both talk about a mix between the real and digital world. The key difference is that, in the MR environment, we are able to interact with digital devices – they are not just overlapped onto the real world, they become an integral part of the world where we are able to interact with them. In MR, both the physical and digital world are interconnected, represented only by a single reality.

Perhaps a better way to understand all these technologies in a proper context is through the “virtuality continuum”, a term coined by Paul Milgram, Haruo Takemura, Akira Utsumi and Fumio Kishino in 1994. The virtuality continuum is in essence a scale ranging from the real world and physical reality at one end to a completely virtual reality at the other.

The following image also illustrates the differences between VR, AR and MR in their particular contexts of representation.

Image representing differences between VR, AR and MR
Image representing differences between VR, AR and MR

From left to right: VR, AR and MR

When defining VR, AR and MR, it’s important to remember where they fit in with the broader framework of extended reality (XR). XR is the umbrella which covers the three specific fields of AR, VR and MR.

Field of view

Field of view (FOV) is an important concept in XR, whether we’re speaking about VR, AR or MR. FOV is the concept that defines how we experience XR. It determines how much we can see and this has a great impact on how we feel and how we internalise the experience. In essence, FOV is the amount of the observable world that is seen at any given moment and it is measured in degrees. It matters not only in terms of quantity (the angle value it reaches) but also in its quality. This applies to the equipment we use in XR, be it VR headsets or MR/AR glasses. It’s therefore important to be aware and check what the different types of XR hardware offer and make possible in this particular field.

Image showing the difference between 3 and 6 degrees of freedom
Image showing the difference between 3 and 6 degrees of freedom

Degrees of freedom (DoF)

There are two levels or degrees of freedom (DoF) that define the quality and the level of immersion of VR and AR experiences. These are called level 3 (3DoF) and level 6 (6DoF).

These levels of freedom are given to us by the VR headsets, or more specifically by the whole system that supports the experience. So, when using a VR or AR headset we should look at the level of DoF it allows, because it will have a huge impact on the type of immersive experience we will have.

3 degrees of freedom (3DoF)

Level 3 recognises three movements – the system tracks rotational motion around the x, y and z axes (also known as pitch, yaw and roll) but not translational. It does not detect the physical movement of the user, but only recognises head movements around the three axes. This means that the user will not be represented on the virtual world if they walk, jump or move to the side.

3DoF headset example: Oculus GO

6 degrees of freedom (6DoF)

Level 6 recognises six movements – the system tracks both rotational and translational motion of a body in a 3D space. Basically, having 6DoF in a virtual experience means that the user can “rotate”:

  • Moving between X and Y - pitch

  • Moving between X and Z - yaw

  • Moving between Z and Y - roll

and the user can also “translate”:

  • Moving up and down along the Y axis - heaving

  • Moving forwards and backwards along the X axis - surging

  • Moving left and right along the Z axis - swaying

This means that the user’s movements will be reflected in the virtual world, not only if they move their head around, but also if they walk, jump or move to the side.

6DoF headset example: Microsoft HoloLens 2

How does VR/AR work?

But how can we experience VR, AR or MR? What do we need to be able to be transported from our physical world to virtual ones, and how are we able to overlap digital objects with the real world and even be able to interact with and manipulate them? In other words, what hardware and software do we need?

The answer depends, of course, on how deep and “real” we are expecting the experience to be. For example, if you have a smartphone (with some basic sensors such as an accelerometer and gyroscope for VR), you can most likely have a VR or AR experience. For example, for a VR experience you just need a device such as Google Cardboard, and of course a VR app or a WebVR experience developed in the format and context of a website.

Your smartphone is a powerful device that easily allows you to experience and alter the physical world around you. Nevertheless, we have to understand the quality of experience that is possible with your mobile device compared to that of a system that was designed and built exclusively to provide a VR or AR experience, such as a head-mounted display (HMD). Take, for example, the next two images, which illustrate the use of a smartphone to experience VR and AR.

Use of a smartphone to experience VR and AR
Use of a smartphone to experience VR and AR

Hardware and software requirements

To experience virtual, augmented or mixed reality in an immersive way requires the correct combination of hardware and software elements. The levels of immersion, experience, sensation and realism depend on this combination. Smartphones provide a less immersive experience. At the other end of the scale, a high level of immersion and sensations are possible with dedicated devices, which are usually business solutions or games.

Hardware devices for AR/VR and MR experiences

As far as hardware is concerned, we have many different devices and objects that facilitate virtual experiences. Their use and manipulation depend on the specific context in which they are implemented and used. For example, personal use (like visiting other places through 360-degree videos) requires a certain level of complexity. Things get more complex if we want to experience games in a virtual environment, and a completely different level of complexity is required when creating a business training environment.

Head-mounted devices (HMD)

According to Wikipedia, “a head-mounted display (HMD) is a display device, worn on the head or as part of a helmet, that has a small display optic in front of one (monocular HMD) or each eye (binocular HMD). A HMD has many uses including gaming, aviation, engineering, and medicine. Virtual reality headsets are HMDs combined with IMUs. There is also an optical head-mounted display (OHMD), which is a wearable display that can reflect projected images and allows a user to see through it.”

There are two types of HMD in terms of their usability:

  • Mobile, which does not need a connection to another device
    Examples include Oculus Go/Quest and Google Daydream

  • Tethered, which requires a connection to a PC or video game console
    Examples include Oculus Rift S and HTC Vive

Cardboard displays

A cardboard display allows the user to experience virtual reality in an economical and accessible format with the use of smartphone and virtual reality apps.

An example of this is Google Cardboard

AR glasses

In essence, these are glasses with the capacity and functionality to allow users to experience enhanced reality. They exist in many formats and with different levels of use (differing mainly in processing capacity, graphics characteristics and price).

Examples include Google Glass Enterprise and Vuzix Blade Smart Glasses.

Mixed reality devices

MR devices allow an immersive experience where the real world and the virtual (digital) world are combined. The user can interact in the real world with virtual objects as if they were real, and can touch them, resize them and more.

Examples include Microsoft HoloLens 2, Magic Leap ONE and NReal.

Heads-up display (HUD)

A head-up display is essentially a transparent display with a projection of digital information that enhances the visual information in the analogue world.

Car heads-up display with driver-assistance data
Car heads-up display with driver-assistance data

Haptics

In a VR/AR display, haptics take the experience to another level of immersion. In addition to sound and sight, haptics allow you to feel and touch. Examples of haptic devices include digital gloves or seats and motion platforms integrated with VR/AR solutions

Creating Virtual Experiences

In order for us to experience other worlds and interact with those worlds we need the hardware platform (as discussed in the previous section) in combination with a software platform. The software is what defines the interactions and experiences in worlds that overlap with ours or in worlds where we disconnect from our reality and immerse ourselves in new digital realities.

Human Machine Interaction

A quick look at human machine interface (HMI) evolution is important to help us understand how humans have developed their interaction with the digital world. According to Robert Scoble and Shel Israel in their book “The 4th transformation: how AR and AI change everything”, there are four keys moments in this evolution:

  1. Typing (text)

  2. Point-and-click (mouse)

  3. Touch (smartphones – the current dominant form)

  4. Natural interaction (MR glasses – the future of interaction?)

This leads us to the consideration that our brain operates and interacts in 3D – this is our natural way of interacting with the world around us. In truth we are not true consumers of 2D screens, we just get used to doing it. Now that we have the possibility of interacting in 3D, we are going back to our roots of natural interaction – even if in this particular context, the interaction is digital.

Spatial computing

According to Simon Greenwold (2003), spatial computing is "human interaction with a machine in which the machine retains and manipulates referents to real objects and spaces". Spatial computing is the crucial step in moving from 2D interactions to 3D interactions. Shashi Shekhar and Pamela Vold in the book Spatial Computing (MIT Press, 2019) define it as “...a set of ideas and technologies that transform our lives by understanding the physical world, knowing and communicating our relation to places in that world, and navigating through those places.”

When it comes to extended reality, it means that the system has the notion of the space that surrounds it. Basically, the system uses the space around the user as a canvas for interfacing with the user. In essence, the system makes use of the user's interactions (such as body movements, gestures or other data sources) as inputs for digital interactions in combination with physical space. Spatial computing allows the blending of the real world and the digital one. It can also be seen as the framework of software and hardware that supports XR experiences.

Nowadays spatial computing has been taken to new levels to create new functionalities and capabilities in the XR universe through the evolution of 3D imaging techniques and AR/VR headsets (or hybrid gear which mixes both), AR glasses and haptic gear that make our interaction with these new realities more natural and authentic.

Note

Interacting with analogue and digital devices in the same context is driving us to a concept known as “digital twins”. A digital twin is a digital replica of a living or non-living physical entity.

Device and Platform Dependencies

Another important issue is what type of VR/AR we will create and for what platform and devices we are targeting our product. All of this is critical when we are deciding what we will need to create the best product (experience) possible.

The whole process is complex and asks for different skills and competences and includes different fields of expertise. We are talking about interdisciplinary teams that work in collaborative processes supported by experts with different areas of technological knowledge.

This is a summary of some software tools and platforms used in the development of VR and AR projects:

  • 3D modelling / scanning: Blender, 3ds Max, MODO, Maya, SketchUP

  • VR and AR platform development: Unity, Unreal, Amazon Sumerian

  • Software development kits (SDKs) / frameworks: ARKit, Cardboard SDK, Oculus SDK, Windows Mixed Reality, ARCore, React 360, WikiTude, OpenVR, Vuforia, VRTK

  • Web environment: AFrame, Web XR API, AR.js

Next section
II. AR/VR in our daily life