Matrices

The traditional way of thinking about matrices is as systems of linear equations. For instance, you might have the following two equations:

2 x + 3 y = 4

$2x + 3y = 4$

6 x + 2 y = 1

$6x + 2y = 1$

Those two equations are telling you about two relationships between the variables $x$ and $y$ . With those two relationships, you can solve for what those variables are.

If you only had one equation, you do not have enough information to solve for both $x$ and $y$ . In fact, they could be anything that satisfies the relationship, or anything on the line $2x + 3y = 4$ , which after a little bit of algebra, we can represent like this:

2 x + 3 y = 4

$2x + 3y = 4$

3 y = 4 - 2 x

$3y = 4 - 2x$

y = \frac{4 - 2 x}{3}

$y = \frac{4 - 2x}{3}$

And if you graphed that line, you would see that valid solution that satisfies that relationship exists anywhere on that line

However, once we add the other line, as long as they are not parallel, then the two will intersect in one place and we will have a single solution for both $x$ and $y$ .

6 x + 2 y = 1

$6x + 2y = 1$

2 y = 1 - 6 x

$2y = 1 - 6x$

y = \frac{1 - 6 x}{2}

$y = \frac{1 - 6x}{2}$

A matrix is just a shorthand way of expressing such a system of equations where we take away the variables and put the entire system in square brackets.

[\begin{matrix} 2 & 3 \\ 6 & 2 \end{matrix}]

$\begin{bmatrix} 2 & 3 \\ 6 & 2 \end{bmatrix}$

Comparing this matrix to the system of equations above, it is now possible to see what is being represented here. Since each row contains an $x$ and a $y$ , you could say that each row itself is a vector. But then, like with the number of equations above, the number of rows tells you at most how many variables you can solve for. In some cases you will not always be able to solve for a variable, despite there being enough rows to be able to solve for it, but that actually tells you something about the underlying equations themselves.

Generalizing a little bit further, the each row gives you a piece of information about the input space and each column gives you a piece of information about the output space.

If we were to visualize the rows of that matrix, we have these vectors:

And if we were to visualize the columns of that matrix, we have these vectors:

Matrix-matrix addition and subtraction is not very interesting - you just add up all the components. Again, the two matrices need to be the same size in order for this to work. So for instance, if we wanted to add the following matrices:

[\begin{matrix} 1 & 1 \\ 2 & 0 \end{matrix}]

$\begin{bmatrix} 1 & 1 \\ 2 & 0 \end{bmatrix}$ +

[\begin{matrix} 2 & 1 \\ 1 & 1 \end{matrix}]

$\begin{bmatrix} 2 & 1 \\ 1 & 1 \end{bmatrix}$

You would end up adding each separate vector component together, which we would visualize like so:

Matrix-vector multiplication is where things get a little more interesting since what we are doing here is transforming the vector by putting it into the output space described by the vectors in the matrix.

This is defined by, for each row, multiplying each entry in each column by each entry in each row of the vector and then adding up the result into the output row. Remember that the output space is described by the number of rows in the matrix, so you will end up with a vector that has as many dimensions as the matrix has rows. In order for the operation to work, you need to have as many as you have rows in the vector you will be multiplying by. For instance, this transformation is going to scale the existing vector by 2 units in the $x$ direction and 2 units in the $y$ direction:

[\begin{matrix} 2 & 0 \\ 0 & 2 \end{matrix}]

$\begin{bmatrix} 2 & 0 \\ 0 & 2 \end{bmatrix}$

[\begin{matrix} 3 \\ 1 \end{matrix}]

$\begin{bmatrix} 3 \\ 1 \end{bmatrix}$ =

[\begin{matrix} 6 \\ 2 \end{matrix}]

$\begin{bmatrix} 6 \\ 2 \end{bmatrix}$

What is more interesting is when you have transformations that are not purely scalar. For instance, this transformation is going to move the vector 1 unit in the $x$ direction for every unit that it has in the $y$ direction. Such a transformation is called a shear.

[\begin{matrix} 1 & 1 \\ 0 & 1 \end{matrix}]

$\begin{bmatrix} 1 & 1 \\ 0 & 1 \end{bmatrix}$

[\begin{matrix} 2 \\ 3 \end{matrix}]

$\begin{bmatrix} 2 \\ 3 \end{bmatrix}$ =

[\begin{matrix} 5 \\ 3 \end{matrix}]

$\begin{bmatrix} 5 \\ 3 \end{bmatrix}$

As we do these transformations, pay attention to the two yellow vectors and their relationship with the magenta vector. The two yellow vectors are something called a basis for the 2D space, something we will revisit later. They are being interpolated from their default position to the position specified in the matrix.

This one just takes every step in the x direction and translates the vector that much in the y direction and vice versa. As such, it is a reflection.

[\begin{matrix} 0 & 1 \\ 1 & 0 \end{matrix}]

$\begin{bmatrix} 0 & 1 \\ 1 & 0 \end{bmatrix}$

[\begin{matrix} 2 \\ 3 \end{matrix}]

$\begin{bmatrix} 2 \\ 3 \end{bmatrix}$ =

[\begin{matrix} 3 \\ 2 \end{matrix}]

$\begin{bmatrix} 3 \\ 2 \end{bmatrix}$

This one actually rotates the whole system around by about 90 degrees, by moving down in the y direction for every step in the x direction and moving right in the x direction for every step in the y direction.

[\begin{matrix} 0 & 1 \\ - 1 & 0 \end{matrix}]

$\begin{bmatrix} 0 & 1 \\ -1 & 0 \end{bmatrix}$

[\begin{matrix} 2 \\ 3 \end{matrix}]

$\begin{bmatrix} 2 \\ 3 \end{bmatrix}$ =

[\begin{matrix} 3 \\ - 2 \end{matrix}]

$\begin{bmatrix} 3 \\ -2 \end{bmatrix}$

What about multiplying two matrices? Well, we can take what we know about matrices - the fact that they are sets of vectors, the rows represent the input space and the columsn represent the output space and our definition of matrix-vector multiplication to multiply two matricies by ...multiplying each column vector in the right hand matrix by the left hand side matrix.

To make things a little simpler, I have color-coded the the vectors in the right hand matrix. The yellow vector represents first column and the magenta vector represents the second column.

[\begin{matrix} 2 & 0 \\ 0 & 2 \end{matrix}]

$\begin{bmatrix} 2 & 0 \\ 0 & 2 \end{bmatrix}$

[\begin{matrix} 3 & 1 \\ 1 & 1 \end{matrix}]

$\begin{bmatrix} 3 & 1 \\ 1 & 1 \end{bmatrix}$ =

[\begin{matrix} 6 & 2 \\ 2 & 2 \end{matrix}]

$\begin{bmatrix} 6 & 2 \\ 2 & 2 \end{bmatrix}$

Again, notice what happens to each vector in the second matrix as the transformation in the first is applied to it. In the second matrix, we had the vector $(1, 1)$ , which got scaled 2 in the x-direction for every component of its x-direction and 2 in the y-direction for every component in the y direction. So it ended up on $(2, 2)$ .

Similarly, for the vector $(3, 1)$ , it was also scaled 2 in the x-direction for every component of its x-direction and 2 in the y-direction for every component in the y direction. So it ended up on $(6, 2)$ .

Here is something a little more complicated - we will apply the same rotation that we did earlier.

[\begin{matrix} 0 & 1 \\ - 1 & 0 \end{matrix}]

$\begin{bmatrix} 0 & 1 \\ -1 & 0 \end{bmatrix}$

[\begin{matrix} 3 & 1 \\ 1 & 1 \end{matrix}]

$\begin{bmatrix} 3 & 1 \\ 1 & 1 \end{bmatrix}$ =

[\begin{matrix} 1 & 1 \\ - 3 & - 1 \end{matrix}]

$\begin{bmatrix} 1 & 1 \\ -3 & -1 \end{bmatrix}$