Matrix Multiplication
Matrix multiplication is arguably the most important operation when talking about matrices. Itβs vital in many of the practical applications of linear algebra like neural networks and computer graphics. In fact, GPUs (graphics processing units - the component in your computer responsible for graphics-related computation) are highly optimized for performing large numbers of m atrix multiplications in parallel, which is a reason they are widely used in machine learning applications.
Conformability
Before we multiply two arbitrary matrices, we have to make sure that they are conformable for multiplication.
Recall that matrix addition and subtraction is only defined for certain kinds of pairs of matrices. That is, for matrices which have the same shape. In other words, two matrices can only be added or subtracted when they are both of size .
I didnβt bring up the concept of conformability in that context because the rule is rather straight forward and intuitive in that case. Of course, they have to be the same size, because to add them together you must add each element of the first matrix with the corresponding element of the second matrix. By that definition, it wouldnβt make sense to add two matrices of different sizes.
When it comes to matrix multiplication, there is also a rule.
You can only multiply two matrices if the first one is of size and the second is of size .
Or in other words:
Given two matrices of size and of size , their product is defined if and only if .
This is very important, as matrix multiplication is not defined for pairs of matrices which donβt satisfy this condition.
When the condition is satisfied, the matrices are said to be conformable for multiplication.
So always keep this in mind: for two matrices to be multiplied, the number of columns of the first matrix must equal the number of rows of the second matrix (not the other way around - we will get into this later in the lesson).
For example, you can multiply the matrix with the matrix , because , but you cannot multiply the matrix with the matrix , because .
Shape of the Product of Two Matrices
Now that we know which matrices can be multiplied, what does the result of this multiplication look like?
Well, for two matrices and where , the result will be another matrix of size . That is, it will have the same number of rows as the first matrix, and the same number of columns as the second matrix.
For example, if we multiply a matrix of size with a matrix of size , their product will be a matrix of the shape .
So, to help you remember:
the inner values () must match for matrix multiplication to be defined, and the outer values ( and ) will be the shape of the resulting matrix.
Definition
Donβt worry, if you donβt understand the formal definition immediately, I will illustrate more thoroughly through some concrete numerical examples later.
For two conformable matrices and , their product is a matrix of size whose entry is defined as
Note: If you donβt understand the indexing with the star () notation, you can read through this lesson.
In other words, the entry of the matrix product is just the dot product of the th row of with the th column of .
If you remember how the dot product works, you can also think of matrix multiplication as giving you a matrix where the entry at position tells you how βcloseβ, βsimilarβ, or βalignedβ the vector from row of matrix is to the vector from column of matrix .
You are basically taking the rows from the first matrix and βcomparingβ them to the columns of the second matrix.
Numerical Example
Take the matrix
and the matrix
We compute their product . I will proceed step-by-step as opposed to how I usually show examples, since matrix multiplication is slightly more complex than other operations.
Conformability
Are the matrices conformable? Let us have a look: matrix is of shape , whereas matrix is of shape . We can see that the number of columns in matrix () is the same as the number of rows in matrix (also ), so we can proceed.
Shape of the Product
Since we will be computing the product of two matrices of size and respectively, the result will have the same number of rows as the first matrix (, which has rows), and the same number of columns as our second matrix (, which has columns).
Therefore, our resulting matrix will be of size .
Computation
We begin with the element in the top left of the resulting matrix, the element at position .
So, as I said above, the element of a product of two matrices at position is the dot product of the th row of the first matrix with the th column of the second matrix. In our case, we are concerned with the first row of and the first column of .
The first row of is
and the first column of is
We compute the dot product, which is given by
So the entry at position of the matrix is .
The second entry we will compute is the one at position (first row, second column).
We have
and
So we have that
The other elements are computed analogously:
Thus,
Computing Specific Rows or Columns
If you donβt need to calculate the entire product , and only need a specific column or row, you donβt have to calculate everything: only calculate what you need.
For example, take a matrix
and a matrix
The first row of is given by the matrix-by-vector product of the first row of (denoted by ) and the matrix :
The first element of the row is
the second is
and the third and final element is
So the first row of the product is
Note that is a row vector, so matrix multiplication is not the same as when youβre multiplying by a column vector. If you donβt know the difference, look at this lesson.
If you wanted to compute the second column of , you could proceed similarly:
The fact that the computation of each entry of a product of a matrix is so separate from the other entries makes it such that it is very easy to parallelize the computation. That means that you can split up the computation into a lot of small (disjoint) parts, solve these sub-problems individually by assigning resources of the computer to each problem, and then joining the result at the end.
GPUs are excellent at this: they have thousands of lightweight cores (think of them as small processing units) which allow them to do very specific things (like matrix multiplication) very quickly.
Properties of Matrix Multiplication
Matrix multiplication behaves quite differently from ordinary multiplication of scalars.
Commutativity Does Not Hold
Just because the product is defined, it does not mean that is also defined. In fact, most of the time it is not. The only time when this is the case is when both matrices and are square matrices.
Even then, just because both products are defined, it does not mean that they are the same. In fact, theyβre usually different.
Look at this example:
We compare the products:
As you can see, .
The Zero-Product Property Does Not Hold
With scalars, given that
we can safely say that either or (or both) is .
This is not true for matrices. Look at this counterexample:
so neither nor , yet
The Cancellation Law Does Not Hold
With scalar multiplication, as long as , you can say that
but this is not true for matrix multiplication:
so , yet
Distributivity
We looked at differences between matrix and scalar multiplication, but they also have some properties in common. One of these is the left and right distributivity:
and
Proof
If youβre interested, here is the proof:
Let be a matrix with columns and let the matrix have rows. We use the definition of matrix multiplication to show that for each row and column
Then, according to the definition of the addition of matrices.
and because , , and are just scalars, we can use the usual distributivity property to say that
Again, according to the definition of matrix multiplication
We have shown that this holds for all , so it must hold for the entire matrix. Hence
The proof for right-distributivity is analogous.
Associativity
Another thing which matrix and scalar multiplication have in common is the associative property:
Proof
If youβre interested, here is the proof for associativity:
Let and be matrices where has columns and has rows. Remember how we compute a specific column of the product of two matrices (as shown above):
Keeping this in mind, we can proceed by invoking the definition of matrix multiplication for each row and column :
We then use what we just showed before (how to compute specific columns of the product of a matrix):
Since is a scalar, it does not matter whether we multiply with it before or after performing the product between and .
Therefore,
Since this holds for all rows and columns, it must be the case that
Connection to Matrix Multiplication
As you may have already noticed, the matrix-by-vector product is just a special case of matrix multiplication.
You are basically multiplying a matrix by another matrix with only column (which is a column vector).
In fact, you can also multiply a matrix by row vectors, but usually the row vector will have to be on the left of the multiplication. That is because a row vector has shape . If it appears on the right in a product , then must have only one column for the multiplication to be defined (which is a column vector).
By the way, the arrow notation () is conventionally reserved for column vectors. You could write a row vector as (a transposed column vector), but you can also just explicitly state that some letter is a row vector and use that.
One Final Remark
The definition might seem arbitrary, but it will hopefully make sense later on in the course. As a small teaser: the definition is chosen this way because a matrix represents a linear transformation, and the multiplication of two matrices is supposed to correspond to the composition of the two transformations represented by the two matrices. Thanks to the definition of matrix multiplication, this effect is achieved. We will get into linear transformations later in the course.