Partial least squares is a technique that fits
combinations of independent variables called factors
to one or more dependent variables. The factors
are chosen to maximize the covariance between the
factors and the dependent variables.
Partial least squares is useful when the number of
independent variables is large compared to the number of
observations, or when variables are highly correlated.
Constructing Partial Least Squares Models
The PartialLeastSquaresModel
class has four constructors. The first constructor takes three arguments.
The first is a VectorT
that represents the dependent variable. The second is a parameter array
of vectors that represent the independent variables.
The last argument is the number of factors that should be computed.
This creates a Partial Least Squares model with one dependent variable,
sometimes called PLS1.
The second constructor takes a matrix instead of a vector as the first
argument. This constructs a multivariate PLS model, sometimes called
PLS2, where each column of the matrix represents
a dependent variable.
In the example below, we create two Partial Least Squares models
using random data. The first has one dependent variable, 10 independent
variables and 20 observations. The second has 3 dependent variables.
In both cases, we're asking for 5 factors:
var dependent = Vector.CreateRandom(20);
var independents = Matrix.CreateRandom(20, 10);
var model1 = new PartialLeastSquaresModel(dependent, independents, 5);
var dependents = Matrix.CreateRandom(20, 3);
var model2 = new PartialLeastSquaresModel(dependents, independents, 5);
Dim dependent = Vector.CreateRandom(20)
Dim independents = Matrix.CreateRandom(20, 10)
Dim model1 = New PartialLeastSquaresModel(dependent, independents, 5)
Dim dependents = Matrix.CreateRandom(20, 3)
Dim model2 = New PartialLeastSquaresModel(dependents, independents, 5)
No code example is currently available or this language may not be supported.
let dependent = Vector.CreateRandom(20)
let independents = Matrix.CreateRandom(20, 10)
let model1 = new PartialLeastSquaresModel(dependent, independents, 5)
let dependents = Matrix.CreateRandom(20, 3)
let model2 = new PartialLeastSquaresModel(dependents, independents, 5)
The third constructor takes 4 arguments. The first argument is a
IDataFrame (a
DataFrameR, C or
MatrixT) that
contains the variables to be used in the regression. The second argument
is an array of strings containing the names of the dependent variables. The third argument
is an array of strings containing the names of the independent variables.
All the names must exist in the column index of the data frame specified
by the first argument. The last argument is once again the number of factors.
In the code that follows, we give the two matrices of dependent and
independent variables a column index. We join these matrices
to get a matrix that can act as a data frame. We then use this
matrix, along with the arrays of column names, to construct
the same PLS model:
var xNames = new string[] {
"x1", "x2", "x3", "x4", "x5",
"x6", "x7","x8", "x9", "x10" };
independents.ColumnIndex = Index.Create(xNames);
var yNames = new string[] { "y1", "y2", "y3" };
dependents.ColumnIndex = Index.Create(yNames);
var all = Matrix.JoinHorizontal(independents, dependents);
var model3 = new PartialLeastSquaresModel(all, yNames, xNames, 5);
Dim xNames = {
"x1", "x2", "x3", "x4", "x5",
"x6", "x7", "x8", "x9", "x10"}
independents.ColumnIndex = Index.Create(xNames)
Dim yNames = {"y1", "y2", "y3"}
dependents.ColumnIndex = Index.Create(yNames)
Dim all = Matrix.JoinHorizontal(independents, dependents)
Dim model3 = New PartialLeastSquaresModel(all, yNames, xNames, 5)
No code example is currently available or this language may not be supported.
let xNames = [|
"x1"; "x2"; "x3"; "x4"; "x5";
"x6"; "x7";"x8"; "x9"; "x10"
|]
independents.ColumnIndex <- Index.Create(xNames)
let yNames = [| "y1"; "y2"; "y3" |]
dependents.ColumnIndex <- Index.Create(yNames)
let all = Matrix.JoinHorizontal(independents, dependents)
let model3 = new PartialLeastSquaresModel(all, yNames, xNames, 5)
The fourth constructor takes three arguments. The first argument once again
contains the data. The second is a string that contains a formula that
describes the model. See the section
on formulas
for details. The same model as above can be defined using a formula as:
var model4 = new PartialLeastSquaresModel(all, "y1 + y2 + y3 ~ .", 5);
Dim model4 = New PartialLeastSquaresModel(all, "y1 + y2 + y3 ~ .", 5)
No code example is currently available or this language may not be supported.
let model4 = new PartialLeastSquaresModel(all, "y1 + y2 + y3 ~ .", 5)
We used the special . term in the right-hand
side to capture all remaining columns as independent variables.
The Compute
method performs the actual analysis. Most properties and methods throw an exception
when they are accessed before the
Compute
method is called. You can verify that the model has been calculated by inspecting the
Computed property.
Fitting the model is done with one of two standard algorithms:
NIPALS (Nonlinear Iterative PArtial Least Squares) or
SIMPLS (Statistically Inspired Modification of Partial Least Squares).
The two algorithms give identical results when there is
only one dependent variable.
By default, the NIPALS algorithm is used.
You can change this by setting the
Method
property. This property is of type
PartialLeastSquaresMethod
and can take on the following values:
Method | Description |
---|
Nipals | Use the original Nonlinear Iterative PArtial Least Squares method
(NIPALS). |
Simpls | Use the Statistically Inspired Modification of Partial Least Squares
method (SIMPLS) of de Jong. |
The number of components to compute can be changed by setting the
NumberOfComponents
property. In the next example, we compute the first model we created
earlier using default settings. For the second model, we change
the number of requested components to 7 and compute the model
using the SIMPLS algorithm:
model1.Fit();
model2.NumberOfComponents = 7;
model2.Method = PartialLeastSquaresMethod.Simpls;
model2.Fit();
model1.Fit()
model2.NumberOfComponents = 7
model2.Method = PartialLeastSquaresMethod.Simpls
model2.Fit()
No code example is currently available or this language may not be supported.
model1.Fit()
model2.NumberOfComponents <- 7
model2.Method <- PartialLeastSquaresMethod.Simpls
model2.Fit()
The PredictedValues
property returns a MatrixT
that contains the values of the dependent variable as predicted by the model.
The YResiduals
property returns a vector containing the difference between the actual and
the predicted values of the dependent variable. Both vectors contain one element
for each observation.
The Coefficients
property returns the matrix of regression coefficients of the model.
The Intercepts
returns the vector of corresponding intercepts.
The StandardizedCoefficients
property returns a matrix of the standardized coefficients, based on centered and
normalized variables.
Several properties give information about the factors and how they relate
to the dependent and independent variables. In PLS, both
the matrix of independent and dependent variables are decomposed into
components. Similar terminology is used.
The XLoadings
property returns a matrix that contains the loadings
and XScores
returns a matrix that contains the scores of the independent variables.
These are the factors T and P in the decomposition
of X into TPT.
The YLoadings
property returns a matrix that contains the loadings
and YScores
returns a matrix that contains the scores of the dependent variables.
These are the factors U and Q in the decomposition
of Y into UQT.
In addition, the WeightMatrix
property returns a matrix containing the projection weights for the independent
variables.
The Predict
method can be used to predict the values of the dependent variables
for new data. The method has three overloads, which all take two arguments.
The first overload takes a vector as its first argument. The vector
contains the values of the independent variables for which a prediction
should be made. The second argument, which is always optional,
specifies how the values in the vector relate to the variables in the model.
This overload returns a vector that contains the predictions for each
of the dependent variables.
The second and third overloads take a matrix and a data frame, respectively,
as their first argument. Each row in the matrix or data frame corresponds
to an observation. The methods return a matrix whose rows contain the
corresponding predictions for the dependent variables.
Verifying the Quality of the Model
One of the objectives of Partial Least Squares is to capture as much
as possible of the variance in both the dependent and
the independent variables.
The XVarianceExplained
and YVarianceExplained
properties return vectors that contain the proportion of variance explained
by each factor. Corresponding
XCumulativeVarianceExplainedYCumulativeVarianceExplained
return the cumulative proportions.
The quality of a PLS model is often assessed using a validation test set.
The Press(MatrixDouble, MatrixDouble)
method computes the PRESS (Predicted REsidual Sum of Squares) of the
model for the supplied data. It takes two arguments.
The first is a matrix that contains the values of the independent variables
to be tested. The second argument is a matrix that contains the values of the
dependent variables. The method returns a vector of the PRESS values
for each dependent variable.
The RootMeanPress(MatrixDouble, MatrixDouble)
method returns a single value: the square root of the mean of these values.
These methods can be used to determine the ideal number of components
using cross validation. In the example below, we split the input
into a training and a test dataset. We print out the PRESS value
for the test set for a model based on a varying number of components,
from 0 to 10:
var trainingSet = new Subset(all.RowCount, 0, 9);
var testSet = new Subset(all.RowCount, 10, 20);
var XTrain = independents.GetRows(trainingSet);
var YTrain = dependents.GetRows(trainingSet);
var model = new PartialLeastSquaresModel(YTrain, XTrain, 0);
for (int k = 0; k <= 10; k++)
{
model.NumberOfComponents = k;
model.Fit();
var XTest = independents.GetRows(testSet);
var YTest = dependents.GetRows(testSet);
double rmPress = model.RootMeanPress(YTest, XTest);
Console.WriteLine("{0}: {1:F6}", k, rmPress);
}
Dim trainingSet = New Subset(all.RowCount, 0, 9)
Dim testSet = New Subset(all.RowCount, 10, 20)
Dim XTrain = independents.GetRows(trainingSet)
Dim YTrain = dependents.GetRows(trainingSet)
Dim model = New PartialLeastSquaresModel(YTrain, XTrain, 0)
For k As Integer = 0 To 10
model.NumberOfComponents = k
model.Fit()
Dim XTest = independents.GetRows(testSet)
Dim YTest = dependents.GetRows(testSet)
Dim rmPress = model.RootMeanPress(YTest, XTest)
Console.WriteLine("{0}: {1:F6}", k, rmPress)
Next
No code example is currently available or this language may not be supported.
let trainingSet = new Subset(all.RowCount, 0, 9)
let testSet = new Subset(all.RowCount, 10, 20)
let XTrain = independents.GetRows(trainingSet)
let YTrain = dependents.GetRows(trainingSet)
let model = new PartialLeastSquaresModel(YTrain, XTrain, 0)
for k in 0..10 do
model.NumberOfComponents <- k
model.Fit()
let XTest = independents.GetRows(testSet)
let YTest = dependents.GetRows(testSet)
let rmPress = model.RootMeanPress(YTest, XTest)
printfn "%d: %.6f" k rmPress