Solving the least squares problem
The minimum of the sum of squares is found by setting the gradient to zero.
Since the model contains m parameters, there are m gradient equations:
and since , the gradient equations become
The gradient equations apply to all least squares problems.
Each particular problem requires particular expressions for the model and
its partial derivatives.
Linear least squares
A regression model is a linear one when the model comprises a linear combination
of the parameters, i.e.,
where the function is a function of .
Letting
we can then see that in that case the least square estimate (or estimator,
in the context of a random sample), is given by
For a derivation of this estimate see Linear least squares (mathematics).
Non-linear least squares
There is, in some cases, a closed-form solution to a non-linear least squares problem –
but in general there is not. In the case of no closed-form solution, numerical algorithms
are used to find the value of the parameters that minimizes the objective.
Most algorithms involve choosing initial values for the parameters. Then,
the parameters are refined iteratively, that is, the values are obtained by successive
approximation:
where a superscript k is an iteration number, and the vector of increments
is called the shift vector. In some commonly used algorithms, at each iteration the
model may be linearized by approximation to a first-order Taylor series expansion
about :
The Jacobian J is a function of constants, the independent variable and the parameters,
so it changes from one iteration to the next. The residuals are given by
To minimize the sum of squares of , the gradient equation is set to zero and solved for
:
which, on rearrangement, become m simultaneous linear equations,
the normal equations:
The normal equations are written in matrix notation as
These are the defining equations of the Gauss–Newton algorithm.