The Basics | C++ | Linux | Command Line

Lesson 10: Mouse Control

Looking around: In this tutorial we look at how we can use the mouse to control our camera and look around within our world.

In the previous lesson, we saw how we could use the view matrix to model a camera and perform some very basic movement of it in our virtual world.

Now in this lesson we'll take this idea a step further, and add the ability to control the orientation of the camera with the mouse, allowing us to look around!

Rotation Refresher

Mathematically, there are several ways we can represent the rotation of our camera, and objects in general. The most common of which, which we'll use here, are Tait–Bryan angles. These are more commonly referred to by the names of the individual rotations, namely an object's roll, pitch, and yaw.

Visualising our roll pitch and yaw axes of rotation

A change in the camera's pitch is analogous to rotating the camera by tilting it up-and-down. This represents a rotation around the Y axis, visualised in green in the image. Remember, we've defined our coordinate system with the X axis forward, Y axis left, and Z upwards.

Similarly, the camera's yaw represents it's rotation around the vertical Z axis (blue in image), in effect allowing us to look left or right. From these two types of rotation alone, pitch and yaw, we can model most common camera systems, for example first- and third-person cameras.

We can of course also rotate the camera in it's X or roll axis (red in image). This kind of rotation is analogous to when you rotate a real-world camera from taking a portrait photo of something to landscape. The camera remains pointing at the same object, but the image in effect rotates about it's centre.

This kind of rotation for a camera is less commonly used in computer graphics than pitch and yaw, generally more for special camera effects. Because of this and to keep this lesson relatively short and simple, we'll therefore focus on modelling just the camera's pitch and yaw rotation for now, which is sufficient for most types of camera you might want to simulate.

Looking around

In essence, to use the mouse to look around our world, we'll capture any mouse movement events, and then use that movement to apply a rotation to our view matrix - in effect rotating our camera within our world.

If we capture this event in our handleEvents function, we can extract the number of the pixels the mouse has moved left/right and up/down on our window.

We'll maintain two variables containing the camera's current pitch and yaw rotation. We can then use the up/down mouse movement to update our camera's pitch, and the left/right to alter it's yaw. Then when we draw to our window, rather than set the camera's lookAt target somewhere into the X axis, we can instead adjust the target's location based on the pitch and yaw variables, in effect giving us the ability to use the mouse to make the camera look around.

In an ideal world, we could take the number of pixels the mouse has moved, and then simply add or subtract that number from the pitch or yaw. That would mean that for one pixel of mouse movement, we would apply a one degree rotation to our camera. Unfortunately, the end result would be quite intense. Moving the mouse a few hundred pixels, across only a fraction of a typical screen, would result in the camera doing huge rotations, and the camera would be difficult for the user to control.

As a result, it's common to apply a "sensitivity" factor to our mouse movement to make the rotations more reasonable. We can take our mouse deltas (delta commonly meaning the amount something has changed by, in this case the number of pixels it has moved), and then multiply these values by a scaling factor. A bigger scaling factor means we're more sensitive to mouse movement - moving the mouse will rotate the camera more per pixel of movement, while a smaller factor will mean the camera rotates less when the mouse is moved. This can then be adjusted to find a nice level of control for the camera.

Camera Target Trigonometry

Once we have our pitch and yaw variables, we then need to update the camera's target based on these, and to do that we'll need a bit of trigonometry.

Yaw

For now, let's ignore the pitch entirely, and just concentrate on the camera's yaw (looking left/right).

What we want to do is take the camera's lookAt target, and rotate it around the camera's current position based on the value of the yaw angle.

For me, the easiest way to mentally work through a problem like this is to write out what the camera's lookAt target position should be, based on various possible values of the yaw, and then figure out the relationship between them. Remember, our rotations follow the right-hand rule, so a positive yaw should rotate our camera to the left.

At 0° yaw: the target should be in front of the camera, just as in the previous lesson. Therefore it will be positioned relative to the camera at +1 in the X axis, and +0 change in the Y axis.
At 90° yaw: we should now be looking to the left, so the target should have moved to +1 relative to the camera in the Y axis, and now no change in the X axis.
At 180° yaw: the target should now be behind the camera, so it should now be positioned -1 in the X axis, and 0 change in the Y.
At 270° yaw: the target is to the camera's right, so the X increases to 0, and Y becomes -1.

If we concentrate on the X axis for a moment, we can see a relationship where when the input (yaw angle) is 0°, will give us +1, with an input of 90° gives us 0, 180° gives -1, and 270° gives zero. That's a cosine wave! Likewise for the Y axis, we can see that it's fairly similar but for an input of zero it should return zero. At 90°, it should give us +1, 180° give us 0, and 270° -1. Using the same logic, we can see that the Y axis is just a sine wave of the yaw.

Therefore, we can set the camera's target position in the following way:

TargetX = CameraX + cosine(yaw)
TargetY = CameraY + sine(yaw)
TargetZ = CameraZ

Setting up our code in this way would allow us to rotate our camera to look around our world according to whatever value we set the yaw angle to.

Pitch

We can apply a similar logic to also understand the camera's pitch.

Unlike the yaw though, we generally don't want to allow the camera to pitch up or down by 360°. The reason for this is not a limitation of mathematics or code, but simply stylistic. It doesn't make sense for a first person view to look upwards beyond vertically up - the camera would become inverted and not realistically possible outside of flight simulator programs.

Therefore we'll limit our camera's pitch in this tutorial, and only consider it in the range from vertically downwards to vertically upwards:

At 0° pitch - the camera should be looking at the horizon. The target's Z value should therefore be zero.
At 90° pitch - the camera should be looking vertically downwards (right hand rule). The target's Z value will be -1.
At -90° pitch - the camera should be looking vertically upwards. The target's Z value will be +1.

Using the same methodology, we need a function which is 0 at 0°, but -1 at 90°, and +1 at -90°. That's a little tricky as there is no trigonometric function which fits these properties.

But, we can get there by taking a sine wave and inverting it. So we can get the target's Z position by subtracting the sine of the pitch angle.

There are two important points here though.

First, as we said, we will prevent the camera going beyond +/- 90° pitch, vertically up or down. There's nothing inherently wrong with going beyond this limit, but currently when calculating our view matrix we set the up vector to +1 in the Z axis. If we were look further upwards than the vertical axis, our camera should become inverted. If you're struggling to imagine this then just keep rotating your head upwards until you're looking behind yourself, the world will be upside-down!

In this case, our up vector should therefore be flipped to point downwards if we pitch beyond 90°, but ours is hard-coded to always point up. Again though it's easier and frankly usually more realistic in most scenarios to instead limit how far we can look up or down. Therefore, I'm going to simply prevent our pitch variable from reaching +/- 90°.

By ensuring it is always less than +/- 90°, this also avoids the edge case question of if we are looking exactly vertically upwards or down, both our up vector and lookAt target would both be pointing in exactly the same direction. Therefore the orientation of the camera would become undefined in this situation. We could fix this with some code, or again we can just limit the pitch range.

The second point is that if you look at the above maths closely, you will see an issue. At 90° pitch, we need to be looking vertically downwards. That means that the target Z value should be -1 relative to the camera, but also that the X and Y values need to drop to zero to make us look perfectly downwards. But currently, our X and Y axes will always have a magnitude of 1, ie. they are currently always assuming the pitch is zero. Therefore we need to scale the target's X and Y coordinates by the pitch to fully integrate it's rotation.

For this scale factor, we need a function that will not affect the X and Y coordinates when the pitch is zero, ie. multiply them by a value of 1. As the pitch moves towards +/- 90° though, it should scale to zero. Therefore the X and Y need to be scaled by the cosine of the pitch. This gives us a final set of equations for rotating our camera using pitch and yaw:

TargetX = CameraX + (cosine(yaw) * cosine(pitch))
TargetY = CameraY + (sine(yaw) * cosine(pitch))
TargetZ = CameraZ - sine(pitch)

Building our look-at target

To start implementing this system, let's begin by setting up the variable to control how sensitive our mouse movement is:

13.	int windowWidth = 1024;
14.	int windowHeight = 600;
15.
+ 16.	`float mouseSensitivity = 0.3;`
17.
18.	SDL_Window* window = NULL;
19.	SDL_GLContext context = NULL;

I've set this up at the start of our code where it can be easily configured.

Adjusting our mouse sensitivity is then just a matter of scaling our mouse deltas by this variable. For me, a value of approximately 0.3 gives me good balanced control of our camera. Feel free to adjust this, some people prefer a more sensitive mouse while some less. Some programs even make this a configurable parameter and let the user adjust it themselves.

We also need to set up the variables to hold the camera's current pitch and yaw angles:

25.	float x = 0;
26.	float y = 0;
27.	float z = 0;
+ 28.	`float pitch = 0;`
+ 29.	`float yaw = 0;`
30.
31.	Shader mainShader;

Initialising the pitch to zero means it is neither looking up nor down when the program starts, but directly at the horizon. Likewise zero yaw means no left/right rotation at start-up, so the camera will begin by facing in exactly the same direction as the last lesson - into the positive X axis.

Let's now write the code to update the pitch and yaw whenever any mouse movement occurs from our handleEvents function:

184.	else if(event.type == SDL_EVENT_WINDOW_RESIZED)
185.	{
186.	windowWidth = event.window.data1;
187.	windowHeight = event.window.data2;
188.	glViewport(0, 0, windowWidth, windowHeight);
189.	}
+ 190.	`else if(event.type == SDL_EVENT_MOUSE_MOTION)`
+ 191.	`{`
+ 192.	`pitch += event.motion.yrel * mouseSensitivity;`
+ 193.	`yaw -= event.motion.xrel * mouseSensitivity;`
194.
195.	...

We can capture mouse movements by checking for events of type SDL_EVENT_MOUSE_MOTION just as we've checked for all the other kinds of events. When we detect this event, we can access it's relative movement values, which are the number of pixels on screen the mouse has moved up/down and left/right. These are available to us in the event.motion.xrel and event.motion.yrel properties. There are also other properties available, such as the mouse's new absolute coordinates in our window, but the relative motion is what we're interested in for first-person camera control.

These movement values are passed as floating point values as certain effects (like screen scaling) might result in non-integer pixel mouse movements. Both the absolute and relative coordinates of the mouse are reported to us in the window's coordinate system. This means that the origin is located in the top-left corner of the window, with the X axis extending to the right and the Y axis downwards. Therefore if the mouse is moved upwards, we will see a negative number reported for the relative movement in the Y axis.

Let's think about how we want these movements to be mapped onto our camera as a rotation. When the mouse is moved downwards, we are given a positive value for the Y movement. From the right-hand rule, we know that a positive change in the pitch will make us look downwards. So moving the mouse down will rotate the camera downwards. Well that aligns nicely.

If we just add the result, any mouse movement will rotate our camera correctly. So for every pixel the mouse moves in the Y axis, we multiply it by our sensitivity and add the result to our camera's pitch.

The inverse of course also holds true, so moving the mouse upwards will give us a negative movement in the Y axis, which will result in a subtraction from the pitch, angling our camera upwards.

For the yaw things are a little bit more tricky. Moving our mouse to the right will give us a positive mouse movement. But a positive yaw rotation should rotate our camera to the left.

As there is a mismatch in the direction, we need to invert the result of the yaw calculation, meaning leftwards mouse movement, which is negative, increases our yaw angle. Again we multiply this value by the mouseSensitivity variable to provide a better mapping between the number of pixels the mouse has moved and the change in angle, which should be in degrees. As we have mouseSensitivity fixed at 0.3, this means 1 pixel of movement will correspond to 0.3 degrees of rotation.

The function finishes up by making sure both variables remain within our limits:

190.	else if(event.type == SDL_EVENT_MOUSE_MOTION)
191.	{
192.	pitch += event.motion.yrel * mouseSensitivity;
193.	yaw -= event.motion.xrel * mouseSensitivity;
194.
+ 195.	`if(yaw > 360)`
+ 196.	`yaw -= 360;`
+ 197.	`else if(yaw < 0)`
+ 198.	`yaw += 360;`
+ 199.
+ 200.	`if(pitch > 30)`
+ 201.	`pitch = 30;`
+ 202.	`else if(pitch < -30)`
+ 203.	`pitch = -30;`
+ 204.	`}`
205.	else if(event.type == SDL_EVENT_KEY_DOWN)
206.	{

We bound the yaw angle between 0° and 360° mainly for debugging reasons. A yaw rotation of 500° would still result in the correct mathematical output, but is far harder to mentally picture than a yaw angle of 140°, both of which face our camera in exactly the same direction. Moreover, bounding the yaw prevents possible underflow/overflow issues if the variable were to get too big or small if the program was run for a long time.

Similarly, for the pitch, we limit the range to plus or minus 30 degrees. 30° is my arbitrary choice for limiting up/down motion here, you can extend it if you wish to. Just remember that you will begin to hit issues at 90° without writing additional code to compensate the up-vector, so in our case we need to make sure it is never equal to 90°. Try it without these limits if you wish to see!

With our pitch and yaw angles now being properly set from mouse movement, we can now implement them into our view matrix:

272.	void draw()
273.	{
274.	glClear(GL_COLOR_BUFFER_BIT \| GL_DEPTH_BUFFER_BIT);
275.
276.	glm::mat4 pMatrix = glm::perspective(1.0f, (float) windowWidth / windowHeight, 0.1f, 100.0f);
277.
+ 278.	`float yawRadians = yaw * 3.1415 / 180;`
+ 279.	`float pitchRadians = pitch * 3.1415 / 180;`
280.
+ 281.	`float targetX = x + (cos(yawRadians) * cos(pitchRadians));`
+ 282.	`float targetY = y + (sin(yawRadians) * cos(pitchRadians));`
+ 283.	`float targetZ = z - sin(pitchRadians);`
284.
285.	...

We calculate each component of the camera's target position using the formulae we previously derived. As the C++ maths functions works exclusively with radians, we start by converting our yaw and pitch angles into them.

Radians are a more mathematically elegant way of expressing angles instead of degrees, which were arbitrarily set at 1/360th of a circle. Rather than this arbitrary division, a radian is 1/π * 2 of a circle, which makes some advanced mathematics a bit easier, so mathematicians often prefer them. One radian is equal to about 57°.

We can convert our degrees to radians by multiplying by π, and then dividing by 180. So the first two lines perform this conversion, and then the next three are our formulae for setting the lookAt position, exactly as we derived before.

Looking at this, you may be wondering why not just use radians everywhere. Why not have the camera's pitch and yaw stored as radians, and exclusively use this mathematically "preferred" system. The answer is simplicity. Again, most people can imagine approximately what an angle of 140° should look like. But if I tell you to imagine a rotation of 2.4 radians, you will probably struggle. So while radians are preferred for working with equations, for debugging and anywhere you deal with humans degrees are much preferred!

The resulting coordinates can then be fed in to the call from before to create the view matrix:

281.	float targetX = x + (cos(yawRadians) * cos(pitchRadians));
282.	float targetY = y + (sin(yawRadians) * cos(pitchRadians));
283.	float targetZ = z - sin(pitchRadians);
284.
+ 285.	`glm::vec3 target = glm::vec3(targetX, targetY, targetZ);`
+ 286.	`glm::mat4 vMatrix = glm::lookAt(glm::vec3(x, y, z), target, glm::vec3(0, 0, 1));`
287.
288.	mainShader.bind();

This code simply creates a GLM vec3 called target composed of the three values we just calculated, which is then passed in to the glm::lookAt call from before.

Great! So with that in place, you should now be able to compile and run the program, and experience looking around by moving your mouse.

Fixing the cursor

If you have been running the code as we've been writing it up to this point, you may have noticed one really irritating issue.

Even though we can look around by moving our mouse within our window, as soon as our mouse moves outside the window we no longer receive any movement information. Even in fullscreen mode, we can only move our mouse to the edge of the screen, but no further, preventing us from continually turning the camera in any direction. This limitation makes our program completely unusable for certain applications.

This is a fundamental limitation of how events are handled - but there is a way around it. We can use a special SDL setting to put our cursor into relative mode.

When this setting is applied, our window will take full control of the cursor. The cursor icon will be hidden from the user and disappear entirely from our screens. Importantly though, any time the mouse is moved, our window will still receive the movement information and any other mouse events like any clicks. As there is no longer a cursor on screen though, there is no window or screen edge to block mouse movement. Therefore, users can move their mouse to the left or right for indefinitely, and the camera will never stop turning.

We can fix our code to allow us to continually look to the sides by putting our window in this state at the start of our program in the init function:

152.	glBindVertexArray(0);
153.
+ 154.	`SDL_SetWindowRelativeMouseMode(window, true);`
155.
156.	glClearColor(0.04f, 0.23f, 0.51f, 1.0f);

The function to do this, SDL_SetWindowRelativeMouseMode, is actually just a switch where you pass in either true or false and pass in the window which it will apply to. Therefore you can toggle relative cursor mode on and off at any point in your code. This is useful if your program has a fullscreen pause menu for example - you can toggle it off to give the user a regular cursor again momentarily.

Quick note on laptops

Just before we finish, I want to quickly point out something that can really confuse and annoy users and developers when writing programs with controls like this. For laptops, a lot of Linux distributions will disable the touchpad momentarily when key presses are detected. This is on the assumption that it's caused by somebody's wrist accidentally making contact with the touchpad when the hands are typing.

This isn't always unintentional though, and can prevent users from simultaneously using the arrow keys with one hand and the touchpad with the other.

If you want your program to be able to accept both forms of input simultaneously (again this is not necessary on desktops or laptops with an external mouse), you will need to tell your users to explicitly allow this in their desktop environment. For example on GNOME based systems, it can be done within the gnome-tweaks tools, or from the command line. Other desktop environments have different ways to alter this setting, but it should be easily findable online.

Windows allows both forms of input simultaneously by default.

Conclusion

Now everything should be in place for us to look around our virtual world with a first-person style camera. Compile and give it a go!

Next up, we'll look at updating our code so that movement is relative to the direction the camera is facing, and look at how to make the movement smooth using time-based movement. See you there!

Download Source View on GitHub Next Lesson