Factor Analysis (FA) - demystify dimensionality reduction techniques (3)

8 min readApr 3, 2023

In my previous posts, I discussed PCA, PCoA, and MDS. In this post, I will explore a slightly different type of dimensionality reduction technique: FA, factor analysis. This technique is not very often used in population genetics, but it plays an essential role in omic analysis, cultural studies, and many more. The following figure is the Inglehart–Welzel cultural map of the world (2023) produced based on a factor analysis of the data from World Values Survey.

Figure from the World Values Survey site: https://www.worldvaluessurvey.org/images/MAP20232.png

Suppose the original data has m observable variables (or traits), then the goal of factor analysis is to model observed variables and their covariance structure in terms of p (unobservable) latent “factors”, with p<<m.

The major difference between PCA and FA is that PCA creates new attributes, which are linear combinations of observed variables (original attributes), that best capture the variance in observed variables, while FA models observed variables as linear combinations of latent factors. As a result, latent factors from FA can sometimes be interpreted more intuitively than new attributes from PCA.

Factor Model

Let’s first take a look at the mathematical model for factor analysis.

Denote the observed variables for a single individual as a random vector x where

with a population mean of

Suppose there are p unobservable common factors, denoted in a vector f of size p×1 as

and m specific factors (or unique factors), one for each variable, denoted in a vector u of size m×1 as

Then the observed variables for an individual can be modeled as linear combinations of common factors plus specific factors and population means (factor model):

Here, Λ is a m×p matrix which conatins fhe factor loadings, where the loading of i-th variable on j-th factor is

Model Assumptions

The factor models have the following model assumptions:

1. Common factors and specific factors both have zero mean.

These ensure that E[x]=μ.

2. Common factors all have variance of 1, and each specific factor has a speicifc variance.

3. The covariance between common factors, between specific factors, and between specific and common factors are all zeros.

These assumptions on variance and covariance can be summarized as

With these assumptions, the variance of i-th variable in the random vector x where

and the covariance between variables are

The variance of the i-th variable is essentially split into two parts, where the first part

is called communality, which is the variability in the variable shared with other variables via common factors. Note that

The second part is the specific/unique variance

which is the variability not shared with other variables.

We can also write the variance-covariance in a matrix (m×m) as

If the variance-covariance matrix can be decomposed into this form, then the factor model holds for x. f and u are not uniquely determined by x, though, which we will discuss more later.

Parameter Estimation

Now that we have set up the factor model, the next step comes to estimating its parameters, i.e., entries in Λ and Ψ. Altogether there are mp+m paramteres to estimate, including mp entries in Λ and p entries in Ψ.

There are two commonly used methods for parameter estimation. The first one is through principal component analysis (PCA), and the second one is through maximum likelihood estimation (MLE).

Principal component analysis (PCA) approach

Suppose X is the data matrix of size m×n, which records the value of m observed variables for n individuals. Each column in X is a vector containing observed variables of an individual, which is the previous vector x.

The sample variance-covariance matrix of the data is

where

Also,

Suppose the eigenvalues and corresponding eigenvectors for S are

The first p eigenvalues and eigenvectors should capture a large portion of the variance, therefore approximating the variance-covariance matrix:

This gives us an estimator of the factor loading for each of the mp entries in Λ. For each column in the factor loading matrix,

Using these, we can also estimate the p parameters for specific variances as:

To evaluate the performance of the model when parameters estimation is done by PCA, we can calculate how much of the total variance is explained by the p factors as

The numerator is the total communality.

One thing to note is that, the factor loadings will always be fixed not matter how many factors we choose, because they are all obtained from the PCA of the same sample variance-covariance matrix and only differ in the number of PCs chosen. The larger the number of factors p, the smaller the error in

but the dimensionality may have not been reduced to a desired level.

Maximum likelihood estimation (MLE) approach

A different method to estimate parameters in the factor model is MLE. Different from the PCA approach, the factor loadings obtained using the MLE approach will indeed change as the number of factors varies.

To perform maximum likelihood estimation, we assume that data are independently sampled from a multivariate normal distribution of a m-dimensional random vector

The probability density is

For n individuals, the vectors are

The joint probability is

With

MLE looks for estimated parameters

which together maximize the log likelihood of data, i.e., log of the joint probability density.

The log likelihood of an observed vector x is

The log likelihood of data is therefore

The parameters that maximize this log likelihood do not have a closed form solution, but can be solved using iterative method. Moreover, the solutions will not be unique unless we enforce a further constraint, for instance, that

is a diagonal matrix. More about the uniqueness of the estimated parameters will be discussed later.

One advantage of the MLE approach over the PCA approach is that, it provides a nature goodness-of-fit test by performing the likelihood ratio test on the variance-covariance matrix under the assumed model and that obtained form data:

Factor rotations

When m>1, the model is ambiguous. Suppose there is a p×p orthorgonal matrix Q, then

where

are also valid common factors and factor loadings. These two sets of factors can be transformed from one to another through the orthogonal matrix Q.

To get a unique set of factors, we can perform factor rotation. There are different methods for factor rotation.

The most popular one is varimax rotation, which maximizes the sum of the variances of the squared loadings. Keeping the underlying coordinates unchanged, the rototaion essentially The actual coordinate system is unchanged, only subject to orthogonal transformation. Recall that the factor loadings are the loadings of variables on factors. The goal of this rotation is to let each variable have a high loading on a single factor but small loadings on others and let each factor have only a few variables with high loadings on it while the remaining variables have very small loadings. Intuitively, varimax rotation looks for a representation of the data where each variable can be well described by a linear combination of a few factors.

Other rotation methods include quartimax, which minimizes the number of factors needed to explain a variable, equimax, which is a compromise of varimax and quartimax, and several more.

A challenge in factor rotation, and more broadly in factor analysis, is how to intepret the factor structure appropriately. A variable may load on multiple factors, and moreover, different rotation methods may result in completely different patterns of loadings, which can lead to different explanations of the factors.

When there is more than one data set, to extract latent factors from the data we need to perform multiple factor analysis (MFA). In the next post, I will provide a debriefing of the mathematics behind MFA.

Reference materials

Mardia, Kantilal Varichand, John T. Kent, and John M. Bibby. “Multivariate analysis.” Probability and mathematical statistics (1979).

Here are my other posts in this series if you are interested:

Linear algebra review & PCA: Demystify dimensionality reduction techniques (1): Principal Component Analysis
PCoA & MDS: Demystify dimensionality reduction techniques (2): PCoA & Multidimensional Scaling
MFA: Demystify dimensionality reduction techniques (4): Multiple Factor Analysis
Practical dimensionality reduction in R and Python: TBC