PCoA & Multidimensional Scaling - demystify dimensionality reduction techniques (2)

8 min readMar 19, 2023

In my last post, I wrote about PCA. Now it’s time to talk about PCoA, or, principal coordinate analysis, and MDS, or, multidimensional scaling. These techniques are undoubtedly widely used in population genetic studies and beyond. For instace, the following figure from Pemberton et al. 2012 uses the MDS plots to visualize pairwise correlations between genome-wide ROH (runs of homozygosity) frequencies in individual populations.

Figure from the paper **Pemberton et al. 2012**. Genomic patterns of homozygosity in worldwide human populations. Am J Hum Genet 91(2):275–92. https://doi.org/10.1016/j.ajhg.2012.06.014

PCoA

What is principal coordinate analysis, and why it looks so similar to principal component analysis? Are they the same thing?

TL;DR: PCA and PCoA are in general different methods, but they share some elements. A special case of PCoA is the same as PCA.

PCA vs. PCoA

PCA and PCoA have many similarities: both are techniques used for multivariate analysis; both are commonly used dimensionality reduction methods; and their derivation both involves eigenanalysis.

While PCA extracts new dimensions (attributes) from the original attributes of data while preserving as much information as possible, PCoA focuses on representing the distances between objects in a low-dimensional space.

We can calculate distances from attributes, but typically not the other way around. From this perspective, PCoA has a wider range of applications than PCA, especially when only the distances (or similarities) between objects are available. Moreover, PCoA works better than PCA when there is many missing data and when there are more features than objects.

Choosing between the two methods is usually not a difficult decision — mainly depending on the type of data you have. But first, we need to understand the math behind PCoA.

Distances? Dissimilarities?

Suppose we have a dissimilarity matrix D

where each entry d is the dissimilarity between a pair of objects i and j. Note that for PCoA, the dissimilarity needs to be a distance measure, which means it needs to satisfy certain criteria which comes from the definition for a metric space:

Positivity:

Symmetry:

Triangle inequality:

Eucliean distance is probably the most common distance measure we see. Other famour distance measures include Manhattan distance, Cosine distance, Hamming distance, …

There also exist (non-metric) distance-like measures which we call, in general, dissimilarities. However, PCoA is not suitable for these general dissimilarities.

Basics of the Torgerson classical scaling

The objective of PCoA is to accurately represent the distances between objects using the smallest possible number of dimensions.

The Torgerson classical scaling is based on follows:

If there is a set of n points that can be perfectly represented spatially in an r-dimensional space, then the Euclidean distance between point i and j is

Putting the points in matrix form

Then the matrix

is essentially the double-centered squared distance matrix.

What is this?

A centering matrix C of size n×n is

where I is the identity matrix and 1 is the vector of all ones of length n.

The squared distance matrix is

Double-centering the squared sitance matrix gives

We can easily work out some math to check that

What does it mean to have this B?

This means that, if given a distance matrix D and asked to find out a set of points that give this distance matrix, then what we need is exactly to construct such a B matrix and take its eigendecomposition to get X!

This is exactly how the Torgerson method works for Euclidean distance PCoA works:

Get the squared distance matrix and double-center it to get the matrix B, which is symmetrix and positive semidefinite (PSD).
Eigendecompose B (more details can be found in my previous post) and get matrix X.

Note that U, which contains orthonormal eigenvector of B as columns, has its inverse equals its transpose because B is symmetric PSD.

The more general optimization strategy

A more general method for PCoA which works not only for Euclidean distance but also other distances is the iterative method, which solves the problem via an optimization strategy: minimizing a cost function.

For Euclidean distance, this objective function is called strain:

For general distances, the objective function that the iterative method minimizes over is called stress:

The iterative method to minimize stress is sometimes called stress majorization. In the future, I may write about the detailed optimization strategy, but for now, let’s just accept it.

PCoA==MDS (sometimes)

One thing to note is that, PCoA is also known as metric MDS. These two are exactly the same thing just with different names! The Euclidean distance PCoA corresponds to the classical MDS (cMDS). Usually, when people mention MDS, they are referring to the metric MDS, so sometimes you may see people calling PCoA and MDS the same thing. However, there is also a non-metric MDS approach, which we will discuss shortly.

Moreover, PCoA with Euclidean distances == classical MDS == PCA.

Now, let’s move on to discuss a wider range of methods known as multidimensional scaling (MDS), which include the PCoA method we just talked about.

MDS

While the term PCoA is more often used in biological applications, MDS is widely used everywhere.

As we have just discussed, one type of MDS, the metric multidimensional scaling (mMDS), is exacly just PCoA, which includes the special case of classical MDS (cMDS, equivalent to PCA).

In this section, we briefly talk about the other type of MDS: non-metric multidimensional scaling (nMDS), where the dissimilarities are no longer quantative, but qualitative (e.g. ordinal).

Non-metric MDS

First, let’s rewrite the optimization objective for MDS in a more intuitive way:

This expression directly show the goal of MDS: obtain a set of low-dimensional space distances that best captures the original distances.

For non-metric MDS, instead of trying to approximate the dissimilarities themselves, it approximates a nonlinear, but monotonic, transformation of them. Monotonicity ensures that larger or smaller distances on a plot of the output correspond to larger or smaller dissimilarities, respectively. However, nonlinearity implies that the method only attempts to preserve the ordering of dissimilarities.

It seeks to find

which is called disparities and is only implicitly defined, and preserves the order of d’s, which means it satisfies

The most common approach for obtaining the coordinates of objects based only on rank order information is through an iterative process referred to as the Shepard-Kruskal approach.

The Shepard-Kruskal approach

In the initial phase, we calculate Euclidean distances d* from an arbitrarily chosen initial configuration in dimension r as the approximated dissimilarities.
In the nonmetric phase, we need to determine the disparities from the initial distances by constructing a monotone regression relationship between the approximated dissimilarities and the original dissimilarities. The function f works as if it were a regression, where the approximated dissimilarities d* are response variables, the order of dissimilarities are explanatory variables, and the disparities f(d) are the predicted values from the regression.
In the metric phase, the configuration is updated so that the new approximated dissimilarities d* are the disparities f(d) obtained from step 2.

This process is repeated until a desired approximation is obtained.

Advantages and disadvantages of non-metric MDS

In my opinion, the advantages and disadvantages of non-metric MDS, when compared to metric MDS, are intertwined.

Dissimilarities only need to be known by their rank order, so it is the only choice if that’s all we know about our data. It is more robust to outliers and noise in the data since it only relies on the rank order. However, it can also lose some information about the magnitude and direction of the dissimilarities between the data points.
It can capture non-linear relationships between data.
It makes few assumptions about the nature of the data, so it can be well-suited for certain type of data, but not as suited as mMDS for some others (for instance, data with linear relationships).
It allows flexibilities in dissimilarity measures. Therefore, we are free to use any type of distance-like dissimilarity measures. This also adds to its ability to handle certain types of data.
It is a numerical technique that iteratively seeks a solution and stops computation when an acceptable solution has been found, or when a maximum number of iterations is reached. The numerical optimization can be time-consuming (although metric MDS can also be computationally expensive).
It is somewhat harder to interpret than metric MDS.

Overall, these are the topics I wanted to discuss regarding PCoA and MDS. In the next post, I will move on to FA (factor analysis) and CA (correspondence analysis).

Here are my other posts in this series if you are interested:

Linear algebra review & PCA: Demystify dimensionality reduction techniques (1): Principal Component Analysis
FA: Demystify dimensionality reduction techniques (3): Factor Analysis
MFA: Demystify dimensionality reduction techniques (4): Multiple Factor Analysis
Practical dimensionality reduction in R and Python: TBC