Matrix Multiplication

Matrix multiplication is arguably the most important operation when talking about matrices. It’s vital in many of the practical applications of linear algebra like neural networks and computer graphics. In fact, GPUs (graphics processing units - the component in your computer responsible for graphics-related computation) are highly optimized for performing large numbers of m atrix multiplications in parallel, which is a reason they are widely used in machine learning applications.

Conformability

Before we multiply two arbitrary matrices, we have to make sure that they are conformable for multiplication.

Recall that matrix addition and subtraction is only defined for certain kinds of pairs of matrices. That is, for matrices which have the same shape. In other words, two matrices can only be added or subtracted when they are both of size $m \times n$ .

I didn’t bring up the concept of conformability in that context because the rule is rather straight forward and intuitive in that case. Of course, they have to be the same size, because to add them together you must add each element of the first matrix with the corresponding element of the second matrix. By that definition, it wouldn’t make sense to add two matrices of different sizes.

When it comes to matrix multiplication, there is also a rule.

You can only multiply two matrices if the first one is of size $m \times p$ and the second is of size $p \times q$ .

Or in other words:

Given two matrices $A$ of size $m \times n$ and $B$ of size $p \times q$ , their product $AB$ is defined if and only if $n = p$ .

This is very important, as matrix multiplication is not defined for pairs of matrices which don’t satisfy this condition.

When the condition is satisfied, the matrices are said to be conformable for multiplication.

So always keep this in mind: for two matrices to be multiplied, the number of columns of the first matrix must equal the number of rows of the second matrix (not the other way around - we will get into this later in the lesson).

For example, you can multiply the matrix $A_{4 \times 3}$ with the matrix $B_{3 \times 6}$ , because $3 = 3$ , but you cannot multiply the matrix $C_{2 \times 4}$ with the matrix $D_{2 \times 3}$ , because $4 \ne 2$ .

Shape of the Product of Two Matrices

Now that we know which matrices can be multiplied, what does the result of this multiplication look like?

Well, for two matrices $A_{m \times n}$ and $B_{p \times q}$ where $n = p$ , the result will be another matrix of size $m \times q$ . That is, it will have the same number of rows as the first matrix, and the same number of columns as the second matrix.

For example, if we multiply a matrix $A$ of size $3 \times 5$ with a matrix $B$ of size $5 \times 6$ , their product $AB$ will be a matrix of the shape $3 \times 6$ .

So, to help you remember:

(m \times n)(n \times q) \to (m \times q),

the inner values ( $n$ ) must match for matrix multiplication to be defined, and the outer values ( $m$ and $q$ ) will be the shape of the resulting matrix.

Definition

Don’t worry, if you don’t understand the formal definition immediately, I will illustrate more thoroughly through some concrete numerical examples later.

For two conformable matrices $A_{m \times p}$ and $B_{p \times q}$ , their product $AB$ is a matrix of size $m \times q$ whose entry $ij$ is defined as

\begin{align*} [AB]_{ij} &= A_{i \star}B_{\star j} \\ &= a_{i1}b_{1j} + a_{i2}b_{2j} + \cdots + a_{ip}b_{pj} \\ &= \sum_{k = 1}^{p} a_{ik}b_{kj}. \end{align*}

Note: If you don’t understand the indexing with the star ( $\star$ ) notation, you can read through this lesson.

In other words, the entry $ij$ of the matrix product $AB$ is just the dot product of the $i$ th row of $A$ with the $j$ th column of $B$ .

If you remember how the dot product works, you can also think of matrix multiplication as giving you a matrix where the entry at position $ij$ tells you how “close”, “similar”, or “aligned” the vector from row $i$ of matrix $A$ is to the vector from column $j$ of matrix $B$ .

You are basically taking the rows from the first matrix and “comparing” them to the columns of the second matrix.

Numerical Example

Take the matrix

A = \begin{bmatrix} 1 & 2 \\ 0 & -2 \\ 3 & 1 \end{bmatrix},

and the matrix

B = \begin{bmatrix} 2 & 0 & 1 & -1 \\ 1 & 3 & -2 & 2 \end{bmatrix}.

We compute their product $AB$ . I will proceed step-by-step as opposed to how I usually show examples, since matrix multiplication is slightly more complex than other operations.

Conformability

Are the matrices conformable? Let us have a look: matrix $A$ is of shape $3 \times 2$ , whereas matrix $B$ is of shape $2 \times 4$ . We can see that the number of columns in matrix $A$ ( $2$ ) is the same as the number of rows in matrix $B$ (also $2$ ), so we can proceed.

Shape of the Product

Since we will be computing the product of two matrices of size $3 \times 2$ and $2 \times 4$ respectively, the result will have the same number of rows as the first matrix ( $A$ , which has $3$ rows), and the same number of columns as our second matrix ( $B$ , which has $4$ columns).

Therefore, our resulting matrix will be of size $3 \times 4$ .

Computation

We begin with the element in the top left of the resulting matrix, the element at position $(1,1)$ .

So, as I said above, the element of a product of two matrices at position $ij$ is the dot product of the $i$ th row of the first matrix with the $j$ th column of the second matrix. In our case, we are concerned with the first row of $A$ and the first column of $B$ .

The first row of $A$ is

A_{1 \star} = \begin{bmatrix} 1 & 2 \end{bmatrix},

and the first column of $B$ is

B_{\star 1} = \begin{bmatrix} 2 \\ 1 \end{bmatrix}.

We compute the dot product, which is given by

\begin{align*} [AB]_{11} &= A_{11} B_{11} + A_{12} B_{21} \\ &= 1 \cdot 2 + 2 \cdot 1 \\ &= 4. \end{align*}

So the entry at position $1,1$ of the matrix $AB$ is $4$ .

The second entry we will compute is the one at position $1,2$ (first row, second column).

We have

A_{1 \star} = \begin{bmatrix} 1 & 2 \end{bmatrix},

and

B_{\star 2} = \begin{bmatrix} 0 \\ 3 \end{bmatrix}.

So we have that

\begin{align*} [AB]_{12} &= A_{11} B_{12} + A_{12} B_{22} \\ &= 1 \cdot 0 + 2 \cdot 3 \\ &= 6. \end{align*}

The other elements are computed analogously:

\begin{align*} [AB]_{11} &= 1 \cdot 2 + 2 \cdot 1 = 4 \\ [AB]_{12} &= 1 \cdot 0 + 2 \cdot 3 = 6 \\ [AB]_{13} &= 1 \cdot 1 + 2 \cdot (-2) = 1 - 4 = -3 \\ [AB]_{14} &= 1 \cdot (-1) + 2 \cdot 2 = -1 + 4 = 3 \\ [AB]_{21} &= 0 \cdot 2 + (-2) \cdot 1 = -2 \\ [AB]_{22} &= 0 \cdot 0 + (-2) \cdot 3 = -6 \\ [AB]_{23} &= 0 \cdot 1 + (-2) \cdot (-2) = 4 \\ [AB]_{24} &= 0 \cdot (-1) + (-2) \cdot 2 = -4 \\ [AB]_{31} &= 3 \cdot 2 + 1 \cdot 1 = 6 + 1 = 7 \\ [AB]_{32} &= 3 \cdot 0 + 1 \cdot 3 = 3 \\ [AB]_{33} &= 3 \cdot 1 + 1 \cdot (-2) = 1 \\ [AB]_{34} &= 3 \cdot (-1) + 1 \cdot 2 = -1. \end{align*}

Thus,

AB = \begin{bmatrix} 4 & 6 & -3 & 3 \\ -2 & -6 & 4 & -4 \\ 7 & 3 & 1 & -1 \end{bmatrix}.

Computing Specific Rows or Columns

If you don’t need to calculate the entire product $AB$ , and only need a specific column or row, you don’t have to calculate everything: only calculate what you need.

For example, take a matrix

A = \begin{bmatrix} 1 & -2 & 0 \\ 3 & -4 & 5 \end{bmatrix},

and a matrix

B = \begin{bmatrix} 3 & -5 & 1 \\ 2 & -7 & 2 \\ 1 & -2 & 0 \end{bmatrix}.

The first row of $AB$ is given by the matrix-by-vector product of the first row of $A$ (denoted by $A_{1 \star}$ ) and the matrix $B$ :

\begin{align*} [AB]_{1 \star} &= A_{1 \star}B \\ &= \begin{bmatrix} 1 & -2 & 0 \end{bmatrix} \begin{bmatrix} 3 & -5 & 1 \\ 2 & -7 & 2 \\ 1 & -2 & 0 \end{bmatrix} \end{align*}

The first element of the row is

[AB]_{11} = 1 \cdot 3 + (-2) \cdot 2 + 0 \cdot 1 = -1,

the second is

[AB]_{12} = 1 \cdot (-5) + (-2) \cdot (-7) + 0 \cdot (-2) = 9,

and the third and final element is

[AB]_{13} = 1 \cdot 1 + (-2) \cdot 2 + 0 \cdot 0 = -3.

So the first row of the product $AB$ is

[AB]_{1 \star} = \begin{bmatrix} -1 & 9 & -3 \end{bmatrix}.

Note that $A_{1 \star}$ is a row vector, so matrix multiplication is not the same as when you’re multiplying by a column vector. If you don’t know the difference, look at this lesson.

If you wanted to compute the second column of $AB$ , you could proceed similarly:

\begin{align*} [AB]_{\star 2} &= AB_{\star 2} \\ &= \begin{bmatrix} 1 & -2 & 0 \\ 3 & -4 & 5 \end{bmatrix} \begin{bmatrix} -5 \\ -7 \\ -2 \end{bmatrix} \\ &= \begin{bmatrix} 1 \cdot (-5) + (-2) \cdot (-7) + 0 \cdot (-2) \\ 3 \cdot (-5) + (-4) \cdot (-7) + 5 \cdot (-2) \end{bmatrix} \\ &= \begin{bmatrix} 9 \\ 3 \end{bmatrix}. \end{align*}

The fact that the computation of each entry of a product of a matrix is so separate from the other entries makes it such that it is very easy to parallelize the computation. That means that you can split up the computation into a lot of small (disjoint) parts, solve these sub-problems individually by assigning resources of the computer to each problem, and then joining the result at the end.

GPUs are excellent at this: they have thousands of lightweight cores (think of them as small processing units) which allow them to do very specific things (like matrix multiplication) very quickly.

Properties of Matrix Multiplication

Matrix multiplication behaves quite differently from ordinary multiplication of scalars.

Commutativity Does Not Hold

Just because the product $AB$ is defined, it does not mean that $BA$ is also defined. In fact, most of the time it is not. The only time when this is the case is when both matrices $A$ and $B$ are square matrices.

Even then, just because both products are defined, it does not mean that they are the same. In fact, they’re usually different.

Look at this example:

\begin{align*} A &= \begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix}, \\ B &= \begin{bmatrix} 0 & 2 \\ 2 & 1 \end{bmatrix}. \end{align*}

We compare the products:

\begin{align*} AB &= \begin{bmatrix} 4 & 4 \\ 8 & 10 \end{bmatrix}, \\ BA &= \begin{bmatrix} 6 & 8 \\ 6 & 8 \end{bmatrix}. \end{align*}

As you can see, $AB \ne BA$ .

The Zero-Product Property Does Not Hold

With scalars, given that

ab = 0,

we can safely say that either $a$ or $b$ (or both) is $0$ .

This is not true for matrices. Look at this counterexample:

A = \begin{bmatrix} 1 & 1 \\ 1 & 1 \end{bmatrix}, B = \begin{bmatrix} -1 & -1 \\ 1 & 1 \end{bmatrix},

so neither $A = 0_{2 \times 2}$ nor $B = 0_{2 \times 2}$ , yet

AB = \begin{bmatrix} 0 & 0 \\ 0 & 0 \end{bmatrix}.

The Cancellation Law Does Not Hold

With scalar multiplication, as long as $a \ne 0$ , you can say that

ab = ac \implies b = c,

but this is not true for matrix multiplication:

A = \begin{bmatrix} 1 & 1 \\ 1 & 1 \end{bmatrix}, B = \begin{bmatrix} 2 & 2 \\ 2 & 2 \end{bmatrix}, C = \begin{bmatrix} 3 & 1 \\ 1 & 3 \end{bmatrix},

so $B \ne C$ , yet

AB = AC = \begin{bmatrix} 4 & 4 \\ 4 & 4 \end{bmatrix}.

Distributivity

We looked at differences between matrix and scalar multiplication, but they also have some properties in common. One of these is the left and right distributivity:

A(B + C) = AB + AC,

and

(B + C)A = BA + CA.

Proof

If you’re interested, here is the proof:

Let $A$ be a matrix with $n$ columns and let the matrix $(B + C)$ have $n$ rows. We use the definition of matrix multiplication to show that for each row $i$ and column $j$

\begin{align*} [A(B + C)]_{ij} &= A_{i \star} (B + C)_{\star j} \\ &= \sum_{k = 1}^n [A]_{ik}[B + C]_{kj}. \\ \end{align*}

Then, according to the definition of the addition of matrices.

\sum_{k = 1}^n [A]_{ik}[B + C]_{kj} = \sum_{k = 1}^n [A]_{ik}([B]_{kj} + [C]_{kj}),

and because $[A]_{ik}$ , $[B]_{kj}$ , and $[C]_{kj}$ are just scalars, we can use the usual distributivity property to say that

\begin{align*} \sum_{k = 1}^n [A]_{ik}([B]_{kj} + [C]_{kj}) &= \\ \sum_{k = 1}^n [A]_{ik}[B]_{kj} + [A]_{ik}[C]_{kj} &= \\ \sum_{k = 1}^n [A]_{ik}[B]_{kj} + \sum_{k = 1}^n [A]_{ik}[C]_{kj} &. \end{align*}

Again, according to the definition of matrix multiplication

\begin{align*} \sum_{k = 1}^n [A]_{ik}[B]_{kj} + \sum_{k = 1}^n [A]_{ik}[C]_{kj} &= \\ A_{i \star} B_{\star j} + A_{i \star} C_{\star j} &= \\ [AB]_{ij} + [AC]_{ij} &= \\ [AB + AC]_{ij} &. \end{align*}

We have shown that this holds for all $ij$ , so it must hold for the entire matrix. Hence

A(B + C) = AB + AC.

The proof for right-distributivity is analogous.

Associativity

Another thing which matrix and scalar multiplication have in common is the associative property:

A(BC) = (AB)C.

Proof

If you’re interested, here is the proof for associativity:

Let $B$ and $C$ be matrices where $B$ has $n$ columns and $C$ has $n$ rows. Remember how we compute a specific column of the product of two matrices (as shown above):

\begin{align*} [BC]_{\star j} &= B_{\star 1} C_{1j} + B_{\star 2} C_{2j} + \cdots + B_{\star n} C_{nj} \\ &= \sum_{k = 1}^n B_{\star k} C_{kj}. \end{align*}

Keeping this in mind, we can proceed by invoking the definition of matrix multiplication for each row $i$ and column $j$ :

[A(BC)]_{ij} = A_{i \star} [BC]_{\star j}.

We then use what we just showed before (how to compute specific columns of the product of a matrix):

\begin{align*} A_{i \star} [BC]_{\star j} &= A_{i \star} \sum_{k = 1}^n B_{\star k} C_{kj} \\ &= \sum_{k = 1}^n A_{i \star} (B_{\star k} C_{kj}). \end{align*}

Since $C_{kj}$ is a scalar, it does not matter whether we multiply with it before or after performing the product between $A_{i \star}$ and $B_{\star k}$ .

Therefore,

\begin{align*} \sum_{k = 1}^n A_{i \star} (B_{\star k} C_{kj}) &= \sum_{k = 1}^n (A_{i \star} B_{\star k}) C_{kj} \\ &= \sum_{k = 1}^n [AB]_{ik} C_{kj} \\ &= [AB]_{i \star} C_{\star j} \\ &= [(AB)C]_{ij}. \end{align*}

Since this holds for all $i$ rows and $j$ columns, it must be the case that

A(BC) = (AB)C.

Connection to Matrix Multiplication

As you may have already noticed, the matrix-by-vector product is just a special case of matrix multiplication.

You are basically multiplying a matrix by another matrix with only $1$ column (which is a column vector).

In fact, you can also multiply a matrix by row vectors, but usually the row vector will have to be on the left of the multiplication. That is because a row vector has shape $1 \times n$ . If it appears on the right in a product $Ar$ , then $A$ must have only one column for the multiplication to be defined (which is a column vector).

By the way, the arrow notation ( $\vec{v}$ ) is conventionally reserved for column vectors. You could write a row vector as $\vec{v}^T$ (a transposed column vector), but you can also just explicitly state that some letter is a row vector and use that.

One Final Remark

The definition might seem arbitrary, but it will hopefully make sense later on in the course. As a small teaser: the definition is chosen this way because a matrix represents a linear transformation, and the multiplication of two matrices is supposed to correspond to the composition of the two transformations represented by the two matrices. Thanks to the definition of matrix multiplication, this effect is achieved. We will get into linear transformations later in the course.