GraphicsBlast
The Basics | C++ | Linux | Command Line

Lesson 8: Movement

Exploration: In this tutorial we look at how we can use the mouse to look around and move within our world.

Previously, we saw how we could use the view matrix to model a camera and perform some very basic movement of it in our virtual world.

In this tutorial we'll take this idea a step further and add the ability to control the orientation of the camera with the mouse, allowing us to look around. We'll then alter our keyboard code so that when we move around, this movement is relative to the direction the camera is facing, instead of simply moving along the world axes.

Rotation Refresher

Mathematically, there are several ways we can represent the rotation of our camera, and objects in general. The most common of which, which we'll use here, are Tait–Bryan angles. These are more commonly referred to by the names of the individual rotations, namely an object's "roll", "pitch", and "yaw".

Visualising our roll pitch and yaw axes of rotation

A change of the camera's pitch is analogous to rotating the camera by tilting it up-and-down. This represents a rotation around the Y axis, visualised in green in the image. Remember, we've defined our coordinate system with the X axis forward, Y axis left, and Z upwards.

Similarly, the camera's yaw angle represents it's rotation around the vertical Z axis (blue in image), in effect allowing us to look left or right. From these two types of rotation alone, pitch and yaw, we can model most common camera systems, for example first- and third-person cameras.

We can of course also rotate the camera in it's X or roll axis (red in image). This kind of rotation is analogous to when you rotate your phone from taking a portrait photo of something to landscape. It remains pointing at the same object, but the image spins about it's centre.

This kind of rotation for a camera is less commonly used than pitch and yaw, generally more for special camera effects. Because of this and for simplicity, we'll therefore focus on modelling just the camera's pitch and yaw rotation for now.

Looking around

In essence, to use the mouse to look around our world, we'll capture any mouse movement events, and then use that movement to apply a rotation to our view matrix - in effect rotating our camera within our world.

If we capture this event in our handleEvents function, we can extract the number of the pixels the mouse has moved left/right and up/down on our window.

We'll maintain two variables containing the camera's current pitch and yaw rotation. We can then use the up/down mouse movement to update our camera's pitch, and the left/right to alter it's yaw. Then when we draw to our window, rather than set the camera's lookAt target somewhere into the X axis, we can instead adjust it's location based on the pitch and yaw variables, in effect giving us the ability to look around with the camera based on the mouse's movement.

In an ideal world, we could take the number of pixels the mouse has moved, and then simply add or subtract that number from the pitch or yaw. That would mean that for one pixel of movement, we would therefore apply a one degree rotation to our camera. Unfortunately, the end result would be quite intense. Moving the mouse a few hundred pixels, across only a fraction of a typical screen, would result in the camera doing large rotations, and the result would be difficult for the user to control precisely.

As a result, it's fairly common to apply a sensitivity factor to our mouse movement to make the rotations more reasonable. We can take our mouse deltas (delta commonly meaning the amount something has changed by, in this case the number of pixels moved), and then multiply these values by a scaling factor to make the movement more reasonable. A bigger scaling factor means we're more sensitive to mouse movement - moving the mouse will rotate the camera more per pixel of movement, while a smaller factor will mean the camera rotates less when the mouse is moved.

Camera Target Trigonometry

Yaw

For now, let's ignore the pitch rotation, and concentrate only on the camera's yaw (looking left/right).

What we want to do is take the camera's lookAt target, and rotate it around the camera's position based on the value of the yaw angle. Conceptually, we have angles and circles and unknown positions, so hopefully this should immediately strike you as a trigonometry problem.

For me, the easiest way to mentally work through the problem is to imagine how the camera's lookAt target position would move based on various possible values of the yaw. Remember, our rotations follow the right-hand rule, so a positive yaw should rotate our camera to the left.

If we concentrate on the X axis for a moment, we can see that we need a function that when the input (yaw angle) is 0°, will give us +1, with an input of 90° gives us 0, 180° gives -1, and 270° gives zero. That's a cosine wave! Likewise for the Y axis, we can see that it's fairly similar but for an input of zero it should return zero. At 90°, it should give us +1, 180° give us 0, and 270° -1. Using the same logic, we can see that the Y axis is just a sine wave of the yaw.

Therefore, we can set the camera's target position in the following way:

Setting up our code in such a way would allow us to rotate our camera to look around our world according to whatever value we set the yaw angle to.

Pitch

We can apply a similar logic to also factor in the camera's pitch. Unlike the yaw though, we generally don't allow the camera to pitch by 360°. In fact generally we limit the camera to +/- 90°, ie. looking vertically upwards or down.

The reason for this is not a limitation of mathematics or code, but simply stylistic. It doesn't make sense for a first person view to look upwards beyond vertically up - the camera would become inverted and not realistically possible outside of flight simulators. Therefore we'll limit our camera's pitch in this tutorial, and only consider it in the range from vertically downwards to vertically upwards:

Using the same methodology, we need a function which is 0 at 0°, but -1 at 90°, and +1 at -90°. That's a little tricky as there is no trigonometric function which fits these properties. However if we invert a sine wave, we can get what we need. So we can get the target's Z position by subtracting a sine wave of the pitch's angle.

There are two important points here though.

First, as we said, we will prevent the camera going beyond +/- 90° pitch, vertically up or down. There's nothing inherently wrong with going beyond this limit, but currently when calculating our view matrix we set the up vector to +1 in the Z axis. If we were look further upwards than the vertical axis, our camera should become inverted. If you're struggling to imagine this then just keep rotating your head upwards until you're looking behind yourself, the world will be upside-down!

In this case, our up vector should therefore be flipped to point downwards if we pitch beyond 90°, but ours is hard-coded to always point up. Again though it's easier and frankly usually more realistic in most scenarios to instead limit how far we can look up or down. Therefore, I'm going to simply prevent our pitch variable from reaching +/- 90°.

By ensuring it is always less than +/- 90°, this also avoids the edge case question of if we are looking exactly vertically upwards or down, both our up vector and lookAt target would both be pointing in exactly the same direction. Therefore the orientation of the camera would become undefined in this situation. Again we could fix this with some clever code, or we can just limit the pitch range.

The second point is that if you look at the above maths closely, you will see an issue. At 90° pitch, we need to be looking vertically downwards. That means that the target Z value should be -1, but also that the X and Y values need to drop to zero to make us look perfectly downwards. But currently, our X and Y axes will always have a magnitude of 1, ie. they are currently always assuming the pitch is zero. Therefore we need to scale the target's X and Y coordinates by the pitch to fully integrate it's rotation.

For this scale factor, we need a function that will not affect the X and Y coordinates when the pitch is zero, ie. multiply them by a value of 1. As the pitch moves towards +/- 90° though, it should scale to zero. Therefore the X and Y need to be scaled by the cosine of the pitch. This gives us a final set of equations for rotating our camera using pitch and yaw:

Building our look-at target

To start implementing this system, let's begin by setting up the variable to control how sensitive our mouse movement is:

11.
int windowWidth = 1024;
12.
int windowHeight = 600;
13.
+ 14.
float mouseSensitivity = 0.3;
15.
16.
SDL_Window* window = NULL;
17.
SDL_GLContext context = NULL;

I've set this up at the start of our code where it can be easily configured.

Adjusting our mouse sensitivity is then just a matter of scaling our mouse deltas by this variable. For me, a value of approximately 0.3 gives me good balanced control of our camera. Feel free to adjust this, some people prefer a more sensitive mouse while some less. Some programs even make this a configurable parameter and let the user adjust it themselves.

We also need to set up the variables to hold the camera's current pitch and yaw angles:

23.
float x = 0;
24.
float y = 0;
25.
float z = 0;
+ 26.
float pitch = 0;
+ 27.
float yaw = 0;
28.
29.
Shader mainShader;

Initialising the pitch to zero means it is neither looking up nor down when the program starts, but directly at the horizon. Likewise zero yaw means no left/right rotation at start-up, so the camera will begin by facing in exactly the same direction as the last lesson - into the positive X axis.

Let's now write the code to update the pitch and yaw whenever any mouse movement occurs from our handleEvents function:

178.
        else if(event.type == SDL_EVENT_WINDOW_RESIZED)
179.
        {
180.
            windowWidth = event.window.data1;
181.
            windowHeight = event.window.data2;
182.
            glViewport(0, 0, windowWidth, windowHeight);
183.
        }
+ 184.
        else if(event.type == SDL_EVENT_MOUSE_MOTION)
+ 185.
        {
+ 186.
            pitch += event.motion.yrel * mouseSensitivity;
+ 187.
            yaw -= event.motion.xrel * mouseSensitivity;
188.
189.
            ...

We can capture mouse movements by checking for events of type SDL_EVENT_MOUSE_MOTION just as we've checked for all the other kinds of events. When we detect this event, we can access it's relative movement values, which are the number of pixels on screen the mouse has moved up/down and left/right. These are available to us in the event.motion.xrel and event.motion.yrel properties. There are also other properties available, such as the mouse's new absolute coordinates in our window, but the relative motion is what we're interested in for first-person camera control.

These movement values are passed as floating point values as certain effects (like screen scaling) might result in non-integer pixel mouse movements. Both the absolute and relative coordinates of the mouse are reported to us in the window's coordinate system. This means that the origin is located in the top-left corner of the window, with the X axis extending to the right and the Y axis down. Therefore if the mouse is moved upwards, we will see a negative number reported in the relative value for the Y axis.

Let's think about how we want these movements to be mapped onto our camera as a rotation. When the mouse is moved downwards, we are given a positive value for the Y movement. From the right-hand rule, we know that a positive change in the pitch will make us look downwards. Well that aligns nicely. If we just add the result, downwards mouse movement will rotate our camera downwards. So for every pixel the mouse moves in the Y axis, we multiply it by our sensitivity and add the result to our camera's pitch.

The inverse of course also holds true, so moving the mouse upwards will give us a negative movement in the Y axis, which will result in a subtraction from the pitch, angling our camera upwards.

For the yaw things are a little bit more tricky. Moving our mouse to the right will give us a positive mouse movement. But a positive yaw rotation should rotate our camera to the left.

As there is a mismatch in the direction, we need to invert the result of the yaw calculation, meaning leftwards mouse movement, which is negative, increases our yaw angle. Again we multiply this value by the mouseSensitivity variable to provide a better mapping between the number of pixels the mouse has moved and the change in angle, which should be in degrees. As we have mouseSensitivity fixed at 0.3, this means 1 pixel of movement will correspond to 0.3 degrees of rotation.

The function finishes up by making sure both variables remain within certain bounds:

184.
        else if(event.type == SDL_EVENT_MOUSE_MOTION)
185.
        {
186.
            pitch += event.motion.yrel * mouseSensitivity;
187.
            yaw -= event.motion.xrel * mouseSensitivity;
188.
+ 189.
            if(yaw > 360)
+ 190.
                yaw -= 360;
+ 191.
            else if(yaw < 0)
+ 192.
                yaw += 360;
+ 193.
+ 194.
            if(pitch > 30)
+ 195.
                pitch = 30;
+ 196.
            else if(pitch < -30)
+ 197.
                pitch = -30;
+ 198.
        }
199.
        else if(event.type == SDL_EVENT_KEY_DOWN)
200.
        {

We bound the yaw angle to between 0° and 360° mainly for debugging reasons. A yaw rotation of 500° would still result in the correct mathematical output, but is far harder to mentally picture than a yaw angle of 140°, both of which face our camera in exactly the same direction. Moreover, bounding the yaw prevents possible underflow/overflow issues if the variable were to get too big or small if the program was run for a long time.

Similarly, for the pitch, we limit the range to plus or minus 30 degrees. 30° is my arbitrary choice for limiting up/down motion here, you can extend it if you wish to. Just remember that you will begin to hit issues at 90° without writing additional code to compensate the up-vector, so in our case we need to make sure it is never equal to 90°. Try it without these limits if you wish to see!

With our pitch and yaw angles now being properly set from mouse movement, we can now implement them into our view matrix:

270.
void draw()
271.
{
272.
    glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT);
273.
274.
    glm::mat4 pMatrix = glm::perspective(1.0f, (float) windowWidth / windowHeight, 0.1f, 100.0f);
275.
+ 276.
    float yawRadians = yaw * 3.1415 / 180;
+ 277.
    float pitchRadians = pitch * 3.1415 / 180;
278.
+ 279.
    float targetX = x + (cos(yawRadians) * cos(pitchRadians));
+ 280.
    float targetY = y + (sin(yawRadians) * cos(pitchRadians));
+ 281.
    float targetZ = z - sin(pitchRadians);
282.
283.
    ...

We calculate each component of the camera's target position using the formulae we previously derived. As the C++ maths functions works exclusively with radians, we start by converting our yaw and pitch angles into them.

Radians are a more mathematically elegant way of expressing angles instead of degrees, which were arbitrarily set at 1/360th of a circle. Rather than this arbitrary division, a radian is 1/π * 2 of a circle, which makes some advanced mathematics a bit easier, so mathematicians often prefer them. One radian is equal to about 57°.

We can convert our degrees to radians by multiplying by π, and then dividing by 180. So the first two lines perform this conversion, and then the next three are our formulae for setting the lookAt position, exactly as we derived before.

Looking at this, you may be wondering why not just use radians everywhere. Why not have the camera's pitch and yaw stored as radians, and exclusively use this mathematically "preferred" system. The answer is simplicity. Again, most people can imagine approximately what an angle of 140° should look like. But if I tell you to imagine a rotation of 2.4 radians, you will probably struggle. So while radians are preferred for working with equations, for debugging and anywhere you deal with humans degrees are much preferred!

The resulting coordinates can then be fed in to the call from before to create the view matrix:

279.
    float targetX = x + (cos(yawRadians) * cos(pitchRadians));
280.
    float targetY = y + (sin(yawRadians) * cos(pitchRadians));
281.
    float targetZ = z - sin(pitchRadians);
282.
+ 283.
    glm::vec3 target = glm::vec3(targetX, targetY, targetZ);
+ 284.
    glm::mat4 vMatrix = glm::lookAt(glm::vec3(x, y, z), target, glm::vec3(0, 0, 1));
285.
286.
    mainShader.bind();

This code simply creates a GLM vec3 called target composed of the three values we just calculated, which is passed in to glm::lookAt from before.

Great! So with that in place, you should now be able to compile and run the program, and experience looking around by moving your mouse.

Camera Movement

If you did compile your program, you may also have noticed that if you try to move around, the direction of movement is fixed and still aligned to the world axes. Moving "forwards" will always move you towards the positive X axis, the direction you were looking when the program started, but not necessarily the direction you're looking now. That's what we'll solve next.

Currently, pressing the "W" key or the up arrow will reposition our camera at one unit further into the world's X axis. To make it so we always move towards the direction the camera is facing, we again need to pull in our knowledge of trigonometry. Again, to derive the formulae, I think the best approach is to take a series of inputs, imagine what the outputs should be, and then figure out what the underlying function is.

To start off with, we'll focus on forwards movement.

First of all, we can conclude that this expression completely ignores the camera's pitch angle. If we assume we are walking on flat ground, having the camera slightly pitched upwards or down won't affect what moving forwards does. Therefore, we can see that the pitch angle won't have any effect on what we calculate here, only the camera's yaw.

Examining therefore what happens if we move forward for various yaw inputs, we can observe the following:

So to move 1 unit forwards, we can add cosine(yaw) to the X position, and sine(yaw) to the Y position. Both of these results can of course be scaled to take bigger or smaller steps.

Moving backwards is really easy, we simply do the reverse of what we did to move forwards. So instead of adding the above we can just subtract it to get the desired effect.

Then we need to consider the left/right movement. Just like for forwards and backwards, they will again just be the inverse of each other, so we only really need to figure out one. I'm going to start by assuming the left key has been pressed:

We can see that the change in the X axis follows a negative sine wave of the yaw, and the Y a cosine wave. Therefore a left movement can be thought of as subtracting a sine wave of the yaw in the X axis, and adding a cosine of the yaw in the Y axis. Again, to move to the right, we can simply do the opposite of this, adding a sine wave to the X, and subtracting a cosine wave in the Y axis.

For anyone reading this and struggling a bit, I want to just stress this - even at the highest levels, these formulae don't just appear in people's minds. It is really helpful to grab a sheet of paper and draw grids and diagrams and write out example scenarios. Everyone who programs graphics falls back to a pen and paper when they need to think things through!

Updating our handleEvents function to make use of these new formulae gives us the following code:

233.
            else if(event.key.keysym.sym == SDLK_w || event.key.keysym.sym == SDLK_UP)
234.
            {
+ 235.
                x += cos(yaw * 3.1415 / 180);
+ 236.
                y += sin(yaw * 3.1415 / 180);
237.
            }
238.
            else if(event.key.keysym.sym == SDLK_s || event.key.keysym.sym == SDLK_DOWN)
239.
            {
+ 240.
                x -= cos(yaw * 3.1415 / 180);
+ 241.
                y -= sin(yaw * 3.1415 / 180);
242.
            }
243.
            else if(event.key.keysym.sym == SDLK_a || event.key.keysym.sym == SDLK_LEFT)
244.
            {
+ 245.
                x -= sin(yaw * 3.1415 / 180);
+ 246.
                y += cos(yaw * 3.1415 / 180);
247.
            }
248.
            else if(event.key.keysym.sym == SDLK_d || event.key.keysym.sym == SDLK_RIGHT)
249.
            {
+ 250.
                x += sin(yaw * 3.1415 / 180);
+ 251.
                y -= cos(yaw * 3.1415 / 180);
252.
            }

Again, it's necessary to make sure our angles are converted to radians before using the built-in trigonometry functions, but otherwise this is all we need to do.

If you run this code, you should find that your arrow keys now move around relative to the camera, not the fixed axes of the world!

Fixing the cursor

If you have been running the code as we've been writing it up to this point, you may have noticed one really irritating issue.

Even though we can look around by moving our mouse within our window, as soon as our mouse moves outside the window we no longer receive any movement information. Even in fullscreen mode, we can only move our mouse to the edge of the screen, but no further, preventing us from continually turning the camera in any direction. This limitation makes our program completely unusable for certain applications.

This is a fundamental limitation of how events are handled - but there is a way around it. We can use a special SDL setting to put our cursor in relative mode.

When this setting is applied, our window will take full control of the cursor. The cursor icon will be hidden from the user and disappear entirely from our screens. Importantly though, any time the mouse is moved, our window will still receive the movement information and any other mouse events like clicks. As there is no longer a cursor on screen though, there is no window or screen edge to block mouse movement. Therefore, users can move their mouse to the left or right for indefinitely, and the camera will never stop turning.

We can therefore fix our code to allow us to continually look to the sides by putting SDL in this state at the start of our program:

146.
    glBindVertexArray(0);
147.
+ 148.
    SDL_SetRelativeMouseMode(SDL_TRUE);
149.
150.
    glClearColor(0.04f, 0.23f, 0.51f, 1.0f);

The function to do this, SDL_SetRelativeMouseMode, is actually just a switch where you pass in either SDL_TRUE or SDL_FALSE. Therefore you can toggle relative cursor mode on and off at any point in your code. This is useful if your program has a fullscreen pause menu - you can toggle it off to give the user a regular cursor again momentarily.

Quick note on laptops

Just before we finish, I want to quickly point out something that can really confuse and annoy users and developers when writing programs with controls like this. For laptops, a lot of Linux distributions disable the touchpad momentarily when key presses are detected. This is on the assumption that it's caused by wrists accidentally making contact with the touchpad when the hands are typing, and is therefore accidental in nature.

This isn't always unintentional though, and can prevent users from simultaneously using the arrow keys with one hand and the touchpad with the other.

If you want your program to be able to accept both forms of input simultaneously (again this is not necessary on desktops or laptops with an external mouse), you will need to tell your users to explicitly allow this in their desktop environment. For example on GNOME based systems, it can be done within the gnome-tweaks tools, or from the command line. Other desktop environments have different ways to alter this setting, but it should be easily findable online.

Windows allows both forms of input simultaneously by default.

Conclusion

Now everything should be in place for us to move around our virtual world with a first-person style camera. Compile and run, and you should now be able to look and move around, which now takes into account your heading!

Next up, we'll look at how to control the speed at which we move across our world. It's slightly more difficult than it might sound! See you there!