Factor Analysis (FA) - demystify dimensionality reduction techniques (3)
In my previous posts, I discussed PCA, PCoA, and MDS. In this post, I will explore a slightly different type of dimensionality reduction technique: FA, factor analysis. This technique is not very often used in population genetics, but it plays an essential role in omic analysis, cultural studies, and many more. The following figure is the Inglehart–Welzel cultural map of the world (2023) produced based on a factor analysis of the data from World Values Survey.
Suppose the original data has m observable variables (or traits), then the goal of factor analysis is to model observed variables and their covariance structure in terms of p (unobservable) latent “factors”, with p<<m.
The major difference between PCA and FA is that PCA creates new attributes, which are linear combinations of observed variables (original attributes), that best capture the variance in observed variables, while FA models observed variables as linear combinations of latent factors. As a result, latent factors from FA can sometimes be interpreted more intuitively than new attributes from PCA.
Factor Model
Let’s first take a look at the mathematical model for factor analysis.
Denote the observed variables for a single individual as a random vector x where
with a population mean of
Suppose there are p unobservable common factors, denoted in a vector f of size p×1 as
and m specific factors (or unique factors), one for each variable, denoted in a vector u of size m×1 as
Then the observed variables for an individual can be modeled as linear combinations of common factors plus specific factors and population means (factor model):
Here, Λ is a m×p matrix which conatins fhe factor loadings, where the loading of i-th variable on j-th factor is
Model Assumptions
The factor models have the following model assumptions:
1. Common factors and specific factors both have zero mean.
These ensure that E[x]=μ.
2. Common factors all have variance of 1, and each specific factor has a speicifc variance.
3. The covariance between common factors, between specific factors, and between specific and common factors are all zeros.
These assumptions on variance and covariance can be summarized as
With these assumptions, the variance of i-th variable in the random vector x where
is
and the covariance between variables are
The variance of the i-th variable is essentially split into two parts, where the first part
is called communality, which is the variability in the variable shared with other variables via common factors. Note that
The second part is the specific/unique variance
which is the variability not shared with other variables.
We can also write the variance-covariance in a matrix (m×m) as
If the variance-covariance matrix can be decomposed into this form, then the factor model holds for x. f and u are not uniquely determined by x, though, which we will discuss more later.
Parameter Estimation
Now that we have set up the factor model, the next step comes to estimating its parameters, i.e., entries in Λ and Ψ. Altogether there are mp+m paramteres to estimate, including mp entries in Λ and p entries in Ψ.
There are two commonly used methods for parameter estimation. The first one is through principal component analysis (PCA), and the second one is through maximum likelihood estimation (MLE).
Principal component analysis (PCA) approach
Suppose X is the data matrix of size m×n, which records the value of m observed variables for n individuals. Each column in X is a vector containing observed variables of an individual, which is the previous vector x.
The sample variance-covariance matrix of the data is
where
Also,
Suppose the eigenvalues and corresponding eigenvectors for S are
The first p eigenvalues and eigenvectors should capture a large portion of the variance, therefore approximating the variance-covariance matrix:
This gives us an estimator of the factor loading for each of the mp entries in Λ. For each column in the factor loading matrix,
Using these, we can also estimate the p parameters for specific variances as:
To evaluate the performance of the model when parameters estimation is done by PCA, we can calculate how much of the total variance is explained by the p factors as
The numerator is the total communality.
One thing to note is that, the factor loadings will always be fixed not matter how many factors we choose, because they are all obtained from the PCA of the same sample variance-covariance matrix and only differ in the number of PCs chosen. The larger the number of factors p, the smaller the error in
but the dimensionality may have not been reduced to a desired level.
Maximum likelihood estimation (MLE) approach
A different method to estimate parameters in the factor model is MLE. Different from the PCA approach, the factor loadings obtained using the MLE approach will indeed change as the number of factors varies.
To perform maximum likelihood estimation, we assume that data are independently sampled from a multivariate normal distribution of a m-dimensional random vector
The probability density is
For n individuals, the vectors are
The joint probability is
With
MLE looks for estimated parameters
which together maximize the log likelihood of data, i.e., log of the joint probability density.
The log likelihood of an observed vector x is
The log likelihood of data is therefore
The parameters that maximize this log likelihood do not have a closed form solution, but can be solved using iterative method. Moreover, the solutions will not be unique unless we enforce a further constraint, for instance, that
is a diagonal matrix. More about the uniqueness of the estimated parameters will be discussed later.
One advantage of the MLE approach over the PCA approach is that, it provides a nature goodness-of-fit test by performing the likelihood ratio test on the variance-covariance matrix under the assumed model and that obtained form data:
Factor rotations
When m>1, the model is ambiguous. Suppose there is a p×p orthorgonal matrix Q, then
where
are also valid common factors and factor loadings. These two sets of factors can be transformed from one to another through the orthogonal matrix Q.
To get a unique set of factors, we can perform factor rotation. There are different methods for factor rotation.
The most popular one is varimax rotation, which maximizes the sum of the variances of the squared loadings. Keeping the underlying coordinates unchanged, the rototaion essentially The actual coordinate system is unchanged, only subject to orthogonal transformation. Recall that the factor loadings are the loadings of variables on factors. The goal of this rotation is to let each variable have a high loading on a single factor but small loadings on others and let each factor have only a few variables with high loadings on it while the remaining variables have very small loadings. Intuitively, varimax rotation looks for a representation of the data where each variable can be well described by a linear combination of a few factors.
Other rotation methods include quartimax, which minimizes the number of factors needed to explain a variable, equimax, which is a compromise of varimax and quartimax, and several more.
A challenge in factor rotation, and more broadly in factor analysis, is how to intepret the factor structure appropriately. A variable may load on multiple factors, and moreover, different rotation methods may result in completely different patterns of loadings, which can lead to different explanations of the factors.
When there is more than one data set, to extract latent factors from the data we need to perform multiple factor analysis (MFA). In the next post, I will provide a debriefing of the mathematics behind MFA.
Reference materials
- Mardia, Kantilal Varichand, John T. Kent, and John M. Bibby. “Multivariate analysis.” Probability and mathematical statistics (1979).
Here are my other posts in this series if you are interested:
- Linear algebra review & PCA: Demystify dimensionality reduction techniques (1): Principal Component Analysis
- PCoA & MDS: Demystify dimensionality reduction techniques (2): PCoA & Multidimensional Scaling
- MFA: Demystify dimensionality reduction techniques (4): Multiple Factor Analysis
- Practical dimensionality reduction in R and Python: TBC