GraphicsBlast
The Basics | C++ | Windows | Command Line

Lesson 7: Camerawork

Moving around: Creating a camera and controlling it's position within our world means we can begin adding basic movement

In this tutorial we look at how the "camera" in our world can be modelled and controlled, and add some basic movement using the arrow keys.

In our previous lessons, all our coordinates were in relation to the window, ranging from -1 to +1 in each axis. These are known as the Normalised Device Coordinates. But clearly trying to model an entire world in such confined coordinates is quite difficult. It's much easier to use human-intuitive scales. What we can do instead is use a natural coordinate system to build a virtual world, and then use some mathematics to transform what's visible in that world into our -1 to +1 scale window.

In this lesson we'll look at remodelling our scene in a world coordinate system. We'll add a movable camera to that world, and then cover the process of projecting that system back into our window.

Note that we'll use a world coordinate system where the X and Y axes represent the floor plane, and Z the vertical component. There is some logic to using X and Y for the floor, as it's intuitively like reading a map of the world. Virtually every program or piece of modelling software uses a slightly different convention here about which axis represents which direction, but it doesn't matter too much as long as you pick a system and stick with it consistently in your code. We'll use a system where X represents forwards, Y to the left, and therefore following the right-hand rule, Z will be up.

Also note that we'll be transforming between various 3D coordinate systems in this lesson by using transformation matrices. If you're not comfortable with matrices or matrix multiplications, we have a special page introducing the topic here, and an introduction to using them as the primary tool for performing transformations in 3D here.

Model Space

Example of coordinates in model space

Usually any object more complex than a simple triangle or square that we need to render will be modelled first in a tool like Blender. The model's vertex coordinates will be saved into a file, which our program will then read to display the model.

As the modeller will probably not know where the model should go in your virtual world, the model will ideally sit centred on the origin. We describe these coordinates of this model as being in model space - they are relative to the model's origin. The coordinates are centred on the x and y axes. We imagine the z axis as representing the object's height, and therefore we imagine that a z coordinate of zero represents the floor. The object is therefore not centred on the z axis, but should just be touching the axis. For some objects, such as buildings, we sometimes go just below zero in order to make sure that there is no seams or possibility or small gaps in case the ground is not even.

World Space

Example of coordinates in world space

To begin constructing our virtual world, we then transform each of our objects into world space. We can imagine this coordinate system as like a virtual map of the world in which our objects are to be placed.

Each of our models needs to be moved into this coordinate system. To capture all the possible ways we might need to rotate or translate or otherwise transform the model's coordinates to get them into world space, we use a transformation matrix. These are compact 2D arrays of 4x4 floats, which can capture any transformation or combination of transformations. Again we have a page dedicated to the topic if you want to get a deeper insight into how transformation matrices work. Any transformation matrix which moves vertices from model space to world space is called a model matrix.

In our image examples, we originally had a box created in model space. To make that object appear three times in our world, we would need to create three model matrices, with each of these describing what transformation is necessary to move the box from being in model space (sat on the origin) to each of the three positions it appears in in our world.

What you choose to be the origin of the world space coordinate system does not matter, but usually it's easier if everything is relatively close to the origin.

We can also imagine that our camera is positioned somewhere within this world space, and is free to move around and look at anything.

Camera Space

The first step towards rendering our world is to make everything relative to the camera inside it. That is to say, we make the camera the centre of the world. Things close to the camera will now be close to the origin.

We apply a single transformation to every vertex in our world, rotating and translating it so that it's now centred around the camera. This transformation matrix is known as the view matrix, in the sense of it's the camera's "view" of the world. Once it's been applied, everything in this coordinate system is described as being in camera space, or relative to the camera.

Calculating this matrix simply involves finding the transformation matrix which moves the camera from it's position in the world to the origin. As we know the camera's position and orientation in our world, it's simply a matter of encoding that as a transformation matrix.

After applying this transformation, things in front of the camera will now be in front of the origin (have a positive x value). Things above the camera will be above the origin. Things far away from the camera will be far from the origin.

Clip Space

This is where the magic happens.

Once we have all our geometry in camera space, we then need to map it to Normalised Device Coordinates, the box of -1 to +1 in each axis which our window can actually display. If we have a transformation matrix which can map all the geometry in camera space which is in front of the camera to this coordinate space, we then have a way of rendering the world as the camera sees it.

That's exactly what our projection matrix does.

Projection matrices come in two main forms.

First of all there is an orthographic projection matrix, which performs this transformation keeping parallel lines orthogonal. The result is a 2D view of the world. There is no parallax in the camera. If we rendered a building like this, it would appear flat on screen like it would appear in a set of blueprints. The depth of the camera just decides what's drawn on top of what in this scenario.

On the other hand, a perspective projection matrix transforms things in a traditional 3D way. Things closer to the camera are transformed more, giving a parallax effective.

One thing to note is that whichever type of projection matrix you use, things need to be transformed into the Normalised Device Coordinates of our window, which have a range of -1 to +1 in the depth (Z) axis. This means there needs to be a finite minimum and maximum distance of things in our world which we can render.

To take a perspective projection matrix as an example, it could transform things up to a distance of say 100 units from the camera into the -1 to +1 range of our window, but anything beyond that distance will be outside the box and therefore not be visible. This means our camera has a minimum and maximum render distance which cannot be avoided. We cannot render triangles at infinite distance from the camera.

These are known as the camera's clip planes, and make other pieces of the underlying maths much easier too. We define a near clip plane, a minimum distance from our camera at which objects can be rendered, and will map to a value of Z value of +1 in our window. If we just set it to zero we would have divide by zero errors, as well as various other issues, so usually we set it slightly above zero. We don't usually want to render things extremely close to the camera anyway, as things super close will fill the whole view no matter how small they are. We usually use a near clip plane of a value of something like 0.1.

Projection matrices also have a far clip plane, which defines the maximum distance which will be mapped to -1 in the window. Geometry beyond this won't be rendered, so if we want to render beyond that it would need to be faked somehow....we'll cover this later!

Unfortunately we cannot just set the far clip plane to some incredibly high value either to solve this problem. We would quickly approach the limit of precision of the buffers. Floating point numbers can only capture so much precision. The result of setting a far clip plane too far away is that after the GPU has transformed the geometry, it will have trouble with distant objects, deciding which triangle should be closer to the camera than the other. This leads to some really strange artefacts, so generally we limit the far clip plane to a value of something like a hundred or a thousand.

A quick technical note here. Multiplying by the projection matrix technically brings everything into "clip space". The resulting values are not homogeneously normalised. This means that the 4th value of the vec4 coordinate won't equal 1.

You don't really need to worry about this, as OpenGL handles this situation automatically, but I just want to give you full disclosure of the pipeline. When you set a vertex's position to have a non-1 value in the 4th component, OpenGL will automatically divide the vertex's x, y, z, and w (4th component) value by w to normalise it. Only after this automatic normalisation step, the vertices will lay in OpenGL's unit square (assuming they're visible to the camera), and we can say they're Normalised Device Coordinates. Before this automatic process, we say they're in clip space. This normalisation process is known as perspective division if you want to read more about it. But you really don't need to worry about this process if you don't want to.

Summary

To recap our transformation pipeline:

The GLM library

We could write a set of functions ourselves which abstract the process of creating 4x4 matrices, and provide functions to multiply them together, etc.

However there is a very clean, efficient and cross-platform library available to perform this task for us, called the GLM library (GL Mathematics). As the library is designed for use with OpenGL, the matrices used here can be passed directly to OpenGL/the GPU. As someone who did originally write their own functions for this, trust me, this library is a wonder. But just to make the point, it is possible to do this yourself without too much effort.

The source code for GLM is available from the library's GitHub page.

Once you have a copy of the code, setting it up is really easy. The library is entirely written as header files, so we just need to add the path to the folder containing them to our Makefile:

3.
OBJS = main.cpp shader.cpp
4.
+ 5.
INCLUDE_DIRS = -IC:\SDL3\include -IC:\SDL3_image\include -IC:\glm -IC:\glew-2.1.0-win32\glew-2.1.0\include
6.
7.
LINKER_DIRS = -LC:\SDL3\lib -LC:\SDL3_image\lib -LC:\glew-2.1.0-win32\glew-2.1.0\lib\Release\x64

It's a bit hard to see, but I've added the path to GLM's code to my INCLUDE_DIRS variable. The library's now ready to use.

Positioning

We'll start off be including the header files for the glm library in main.cpp.

1.
#include <stdio.h>
2.
#include <SDL3/SDL.h>
3.
#include <SDL3/SDL_main.h>
4.
#include <SDL3_image/SDL_image.h>
5.
#include <GL/glew.h>
+ 6.
#include <glm/glm.hpp>
+ 7.
#include <glm/gtc/type_ptr.hpp>
8.
9.
#include "shader.h"

The first header imports the main library itself, giving us access to various matrix and vector data-types in our code. Meanwhile the second header includes the type pointer functions, which give us convenient ways of passing these types directly into OpenGL. The library claims it can get quite compiler intensive, so has many of its functions spread across many header files which are to be included individually where needed. This is to save on compiler time, hence the function not being included by default.

Next, let's define a few variables we are going to need.

17.
bool programRunning = true;
18.
bool isFullscreen = false;
19.
bool useWireframe = false;
20.
+ 21.
float x = 0;
+ 22.
float y = 0;
+ 23.
float z = 0;
24.
25.
Shader mainShader;
26.
27.
GLuint vao;
28.
GLuint vbo[3];

These variables will be used to store the position of the camera within our world. For now, our camera is positioned at (0, 0, 0) - on the origin of our world space coordinate system.

Remember that the units of the world are whatever you want them to be. If you want your world to be based on meters, feet or light years, it doesn't matter. Just make sure that you use it consistently throughout your code.

As we plan for our world coordinate system to have X forwards, Y to the left, and Z upwards, we need to update our rendered square's vertex coordinates. As things currently stand, they all have a Z value of zero, and vary in the other two axes, so they would be laying flat on the floor in our world.

Let's adjust them so that the square "stands-up" in our world. Remember, these are the coordinates of a model so should be in model space, centred on the origin. They'll be positioned within the world by the model matrix.

104.
    GLfloat vertices[] =
105.
    {
+ 106.
        0.0f, 0.5f, -0.5f,
+ 107.
        0.0f, -0.5f, -0.5f,
+ 108.
        0.0f, -0.5f, 0.5f,
+ 109.
        0.0f, 0.5f, 0.5f
110.
    };

The coordinates now vary along the Y and Z axes, but flat in the X. It will have some left-right dimensionality, and some up-down. If we position it in front of the camera, it will therefore be visible in the world coordinate system.

Before we going any further, I'm also going to adjust our handleEvents function. I'm going to make it so that pressing the arrow keys (or "WASD") will adjust our x, y, z variables, in effect altering the camera's position allowing it to move around our world. I'm not going to do anything fancy right now, just simply adjust the position based on which key was pressed, in effect strafing the camera around.

200.
            else if(event.key.keysym.sym == SDLK_t)
201.
            {
202.
                useWireframe = !useWireframe;
203.
                if(useWireframe)
204.
                {
205.
                    glPolygonMode(GL_FRONT_AND_BACK, GL_LINE);
206.
                }
207.
                else
208.
                {
209.
                    glPolygonMode(GL_FRONT_AND_BACK, GL_FILL);
210.
                }
211.
            }
+ 212.
            else if(event.key.keysym.sym == SDLK_w || event.key.keysym.sym == SDLK_UP)
+ 213.
            {
+ 214.
                x += 1;
+ 215.
            }
216.
            ...

We check if the user either pressed the up arrow key, or the "W" on their keyboard. If this happened, we move the camera forward, or into the X axis. Remember that in our world, X is forwards, Y is to the left, and Z up.

Let's now do the same for the rest of the keys:

212.
            else if(event.key.keysym.sym == SDLK_w || event.key.keysym.sym == SDLK_UP)
213.
            {
214.
                x += 1;
215.
            }
+ 216.
            else if(event.key.keysym.sym == SDLK_s || event.key.keysym.sym == SDLK_DOWN)
+ 217.
            {
+ 218.
                x -= 1;
+ 219.
            }
+ 220.
            else if(event.key.keysym.sym == SDLK_a || event.key.keysym.sym == SDLK_LEFT)
+ 221.
            {
+ 222.
                y += 1;
+ 223.
            }
+ 224.
            else if(event.key.keysym.sym == SDLK_d || event.key.keysym.sym == SDLK_RIGHT)
+ 225.
            {
+ 226.
                y -= 1;
+ 227.
            }
228.
            ...

For the down arrow, we just do the reverse and subtract from the X position. When the user tries to go to the left, we add to the camera's Y position, and of course for the right we therefore subtract from it.

As we have a Z coordinate as well for moving the camera up and down, let's implement that too:

224.
            else if(event.key.keysym.sym == SDLK_d || event.key.keysym.sym == SDLK_RIGHT)
225.
            {
226.
                y -= 1;
227.
            }
+ 228.
            else if(event.key.keysym.sym == SDLK_LSHIFT)
+ 229.
            {
+ 230.
                z += 1.0;
+ 231.
            }
+ 232.
            else if(event.key.keysym.sym == SDLK_LCTRL)
+ 233.
            {
+ 234.
                z -= 1.0;
+ 235.
            }
236.
        }
237.
    }
238.
}

In this case, I've bound these to the left shift and left control keys, this time altering the camera's height above the ground.

Setting up our matrices

Now that our code contains the position of the camera within our world, and we can move it around as we please, we need to actually feed that information into the shaders and make use of it. We therefore need to construct our three matrices: a model matrix to position our square within our world, a view matrix to set the position of the camera, and projection matrix to define our camera's parameters.

We'll utilise GLM to construct these matrices in the draw function - we won't need them anywhere else. As the projection and view matrices may be useful in the future for other shader programs, we'll define them at the start of the function, before we bind our shader. The model matrix on the other hand changes for each object we're drawing, so we'll define that later, right before we perform the draw.

245.
void draw()
246.
{
247.
    glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT);
248.
+ 249.
    glm::mat4 pMatrix = glm::perspective(1.0f, (float) windowWidth / windowHeight, 0.1f, 100.f);
250.
    ...

GLM gives us some nice functions to automate the creation of these matrices using a few necessary parameters. The matrices returned are of type glm::mat4, a 4x4 matrix which is compatible with OpenGL and can be passed into our shaders.

GLM's function to create a perspective projection matrix, glm::perspective, requires a few parameters to figure out how to convert between camera space and clip space. They are the camera's field-of-view, it's aspect ratio, and near and far clipping planes.

The field-of-view defines the angle our camera can see, in radians. So a higher value would make it seem like our camera has a fish-eye lens, while a much lower value can be used for effects like rifle scopes or binoculars. A normal value is somewhere between 45 and 70 degrees. I've used 1 radian here, or about 57 degrees. It's fun to play around with this value though and see the effect it has on our finished program.

As our window on the desktop is likely not perfectly square in shape, the field of view parameter defines this only for the vertical axis of the camera. We then pass in the aspect ratio of window. This allows the field-of-view to automatically be calculated in the horizontal axis as well, so our camera will automatically adapt to changes in our window's size. To get the aspect ratio, we simply divide the window's width by it's height.

Finally, we pass in the near and far clip planes. This is everything GLM needs to construct a perspective projection matrix for us.

GLM also has a nice function to create the view matrix automatically too:

245.
void draw()
246.
{
247.
    glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT);
248.
249.
    glm::mat4 pMatrix = glm::perspective(1.0f, (float) windowWidth / windowHeight, 0.1f, 100.f);
+ 250.
    glm::mat4 vMatrix = glm::lookAt(glm::vec3(x, y, z), glm::vec3(x+1, y, z), glm::vec3(0, 0, 1));
251.
252.
    mainShader.bind();

This function is slightly simpler as all it needs to do is create a matrix to adjust the origin, rather than add the camera's distortion.

To construct this matrix, GLM's glm::lookAt function needs to know the camera's position and orientation. We first pass the camera's location in our world (as a glm::vec3), and then the position of something it is looking at. For this, I've simply passed the same coordinates again but with a +1 in the X axis, so the camera will always be looking towards the positive X axis.

The final parameter defines which axis is our "up" axis in our world. This is necessary as without it, our camera could be upside-down, but still in the same position and looking at the same thing. We pass another glm::vec3 indicating that the top of the camera is aligned with the Z axis, the "up" axis of our world. From these values, GLM can calculate the camera's position and orientation in our world, and generate a matrix to transform our geometry from world space to camera space - where the camera is at the centre of the world.

With our view and projection matrix defined, we can bind our shader and then pass them in:

252.
    mainShader.bind();
253.
+ 254.
    glUniformMatrix4fv(0, 1, GL_FALSE, glm::value_ptr(pMatrix));
+ 255.
    glUniformMatrix4fv(1, 1, GL_FALSE, glm::value_ptr(vMatrix));
256.
257.
    glBindVertexArray(vao);

We pass the matrices in to our shaders as something called uniforms. When we passed data in to our vertex shader before, we passed them in as vertex attributes, which were different for each vertex. But when we know the data will be identical for every single vertex, we can instead pass it in as a "uniform", meaning the data is the same for every vertex. The GPU can then use this information to optimise memory usage.

To pass the data in as a uniform to our shader, we use the function glUniform suffixed with the data-type we're passing in. A ton of these functions exist to pass in all the various possible data types you might want to use. In our case, we use glUniformMatrix4fv, as we have a Matrix of 4x4 shape. The "f" denotes we're passing floats, and the "v" denotes we're passing an array (in this case an array of just a single matrix).

The first parameter of the function defines which uniform location the data is being passed to, just as we did for our VBOs. The locations are independent of the attribute locations, hence starting at zero again. We pass the projection matrix to location 0, and the view matrix to location 1. The next parameters is how many matrices we are passing, for which we pass a single matrix each time so we use 1. The third parameter is for if we want OpenGL to transpose these inputs before sending them to our shader. This can be useful if your matrix column/row ordering differs from OpenGL's, but GLM is compatible with how OpenGL works so we don't need to do anything.

We finish up the function calls by passing a pointer to our data, which will be passed to the shader. To do this, we use the glm::value_ptr function to get a pointer to the GLM matrix variable's raw data. Our view and projection matrices will now be accessible in our shaders.

To finish up with our main.cpp, we need to create the model matrix too, and push it to the GPU. Remember, this takes our square and positions it somewhere in our virtual world.

257.
    glBindVertexArray(vao);
258.
+ 259.
    glm::mat4 mMatrix = glm::mat4(1.0f);
+ 260.
    mMatrix = glm::translate(mMatrix, glm::vec3(5.0, 0.0, 0.0));
+ 261.
    glUniformMatrix4fv(2, 1, GL_FALSE, glm::value_ptr(mMatrix));
262.
263.
    glDrawElements(GL_TRIANGLES, 6, GL_UNSIGNED_INT, 0);

To do this, right before we draw this particular geometry, we create a new GLM matrix with the glm::mat4 function. GLM initialises the diagonal of the matrix with the value we pass here, so by passing a value of one we create an identity matrix. We then translate the matrix into the X axis by 5 units, meaning that the model matrix now represents a translation transformation of this change. Any points multiplied by it will undergo this translation. We then upload it to the GPU as we did for the other matrices, this time into the third uniform location.

If we wanted to draw a second square here, we would simply create a new model matrix, translate it to a different location, upload it again, and then make another call to glDrawElements. Our shader, VAO and view and projection matrices are all already set, so they will be re-used for drawing the extra square.

Updating our vertex shader

We now have our matrices passed in to our shaders as uniforms. Let's update our vertex shader to make use of them:

1.
#version 460
2.
+ 3.
layout(location = 0) uniform mat4 uPMatrix;
+ 4.
layout(location = 1) uniform mat4 uVMatrix;
+ 5.
layout(location = 2) uniform mat4 uMMatrix;
6.
7.
layout(location = 0) in vec3 aPosition;
8.
layout(location = 1) in vec3 aColour;
9.
10.
out vec3 colour;
11.
12.
void main()
13.
{
14.
    colour = aColour;
+ 15.
    gl_Position = uPMatrix * uVMatrix * uMMatrix * vec4(aPosition, 1.0f);
16.
}

At the top of our vertex shader we declare the three uniforms we will receive here. They use the same layout notation as regular shader inputs, but as they are the read-only uniform types, we only need to write uniform rather than define them as inputs or outputs. We also switch to the data-type mat4, or a 4x4 matrix - one of the built-in GLSL types.

These variable names are by convention prefixed with a "u" to indicate that they're uniforms.

The nice thing about transformation matrices is that to actually modify the position of the vertex by these, we simply need to multiply them together. This is something that these shaders programs are highly optimised for.

Remember that when working with matrices, we start on the right, and then progressively multiply by the matrices to the left. This is mathematical convention, and understood by our GPU. We start by making the existing vertex position a vec4, a homogeneous coordinate, with the final value of 1 meaning that it is a point in space which can therefore be translated, unlike directional vectors for example.

This point is then multiplied by the model matrix, resulting in the vertex now being in world space. If we think about what shape the result of that matrix multiplication would be, we can see that it would be a 4x1 matrix, or simply another vec4.

The result is then multiplied by the view matrix to make it relative to the camera, or in camera space.

Finally, the point is multiplied by the projection matrix, which applies the camera's parameters to the point. This moves the point into clip space. Normalising the homogeneous coordinate will then be performed automatically, resulting in the point being within or -1 to +1 window (Normalised Device Coordinates) if it's visible by our camera.

Conclusion

Great, so now we have a virtual world set up! We can now define models in their own coordinate system and then position them within our world. We can then define a camera position in that world and transform everything it would see to our window.

Compile and run the code and you should now be able to move around within the world! It's only simple movement right now, but we'll fix that in the next lesson when we add the ability to look around with our mouse.