Visualizing matrices
Ever since I started studying data science, I got into a habit of visualizing matrices, either on paper or in my head. Visualizing the shapes of matrices that take part in an algebraic expression is a really neat way to debug your operations on matrices! This is handy when working with Numpy or Matlab, when reading literature, and when writing papers. In this note, I show you a couple of examples!
Multiplying two matrices
Suppose we want to multiply two matrices, \(\mathbf{A} \in \mathbb{R}^{p \times q}\) and \(\mathbf{B} \in \mathbb{R}^{q \times r}\), and infer the shape of the resulting matrix, \(\mathbf{U}\):
\(\mathbf{U} = \mathbf{A} \cdot \mathbf{B}\)
The interior dimensions of two multiplied matrices need to match since this is the dimension along which the dot products are computed between the rows of \(\mathbf{A}\) and the columns of \(\mathbf{B}\):
The remaining dimensions, \(p\) and \(r\), are “free”, in a sense that they can be anything and they dictate the dimensions of the resulting matrix, \(\mathbf{U}\).
So, in this example, \(\mathbf{U} \in \mathbb{R}^{p \times r}\).
Multiplying three or more matrices
Need to multiply three or more matrices? No problem! Just stack your earlier matrix at the bottom and repeat!
Let’s take \(\mathbf{A} \in \mathbb{R}^{p \times q}\), \(\mathbf{B} \in \mathbb{R}^{q \times r}\) and \(\mathbf{C} \in \mathbb{R}^{r \times s}\). The matrix multiplication is as follows:
\(\mathbf{U} = \mathbf{A} \cdot \mathbf{B} \cdot \mathbf{C}\)
Now consider your matrix multiplication equation from right to left and draw it from top to bottom, stacking the consecutive matrices:
Here, \(p\) and \(s\) are “free dimensions”, i.e., they can be anything and that anything dictates the dimensions of the resulting matrix. In this example \(\mathbf{U} \in \mathbb{R}^{p \times s}\).
Multiplying vectors
You can apply the same strategy to perform vector-matrix multiplications or vector-vector multiplications. Let’s look at two more examples where we multiply two vectors, \(\mathbf{a}\) and \(\mathbf{b}\).
In the first example, we compute a classic dot product between two vectors of the same length. Let’s take \(\mathbf{a} \in \mathbb{R}^{p}\) and \(\mathbf{b} \in \mathbb{R}^{p}\) and create a vector (dot) product:
\(u = \mathbf{a} \cdot \mathbf{b}^{\top}\)
This multiplication of vectors results in a single value (a scalar), \(u\):
Note that from the picture above it becomes clear that we couldn’t multiply \(\mathbf{a} \cdot \mathbf{b}^{\top}\) if \(\mathbf{a}\) and \(\mathbf{b}\) had different lengths. There simply wouldn’t be enough elements in the shorter vector to complete the dot product.
On the other hand, we can multiply two vectors of different length, say \(\mathbf{a} \in \mathbb{R}^{p}\) and \(\mathbf{b} \in \mathbb{R}^{q}\), in the following way:
\(\mathbf{U} = \mathbf{a}^{\top} \cdot \mathbf{b}\)
simply because it’s still the interior dimension of two multiplied vectors that has to match, and in this case this dimension is equal to 1! The result of such multiplication of two vectors is an entire matrix of shape \(\mathbf{U} \in \mathbb{R}^{p \times q}\):
Similarly to the matrix-matrix multiplication examples, \(p\) and \(q\) are “free dimensions”. They can be anything and they dictate the dimensions of the resulting matrix.