Showing posts with label least squares. Show all posts
Showing posts with label least squares. Show all posts

Tuesday, 2 May 2017

Lesson Note On Solving the least squares problem


Solving the least squares problemEdit

The minimum of the sum of squares is found by setting the gradient to zero. 
Since the model contains m parameters, there are m gradient equations:
Image result for least squares
{\displaystyle {\frac {\partial S}{\partial \beta _{j}}}=2\sum _{i}r_{i}{\frac {\partial r_{i}}{\partial \beta _{j}}}=0,\ j=1,\ldots ,m,}
and since r_i=y_i-f(x_i,\boldsymbol \beta), the gradient equations become
-2\sum _{i}r_{i}{\frac  {\partial f(x_{i},{\boldsymbol  \beta })}{\partial \beta _{j}}}=0,\ j=1,\ldots ,m.
The gradient equations apply to all least squares problems. 
Each particular problem requires particular expressions for the model and 
its partial derivatives.

Linear least squaresEdit

Image result for least squares
A regression model is a linear one when the model comprises a linear combination 
of the parameters, i.e.,
f(x,\beta )=\sum _{{j=1}}^{m}\beta _{j}\phi _{j}(x),
where the function \phi _{j} is a function of x.
Letting
X_{{ij}}={\frac  {\partial f(x_{i},{\boldsymbol  \beta })}{\partial \beta _{j}}}=\phi _{j}(x_{{i}}),
we can then see that in that case the least square estimate (or estimator, 
in the context of a random sample),  \boldsymbol \beta is given by
{\displaystyle {\boldsymbol {\hat {\beta }}}=(X^{T}X)^{-1}X^{T}{\boldsymbol {y}}.}
For a derivation of this estimate see Linear least squares (mathematics).

Non-linear least squaresEdit


There is, in some cases, a closed-form solution to a non-linear least squares problem – 
but in general there is not. In the case of no closed-form solution, numerical algorithms 
are used to find the value of the parameters \beta  that minimizes the objective. 
Most algorithms involve choosing initial values for the parameters. Then, 
the parameters are refined iteratively, that is, the values are obtained by successive 
approximation:
{\beta _{j}}^{{k+1}}={\beta _{j}}^{k}+\Delta \beta _{j},
where a superscript k is an iteration number, and the vector of increments \Delta \beta _{j} 
is called the shift vector. In some commonly used algorithms, at each iteration the 
model may be linearized by approximation to a first-order Taylor series expansion
 about {\boldsymbol  \beta }^{k}:
{\displaystyle {\begin{aligned}f(x_{i},{\boldsymbol {\beta }})&=f^{k}(x_{i},{\boldsymbol {\beta }})+\sum _{j}{\frac {\partial f(x_{i},{\boldsymbol {\beta }})}{\partial \beta _{j}}}\left(\beta _{j}-{\beta _{j}}^{k}\right)\\&=f^{k}(x_{i},{\boldsymbol {\beta }})+\sum _{j}J_{ij}\,\Delta \beta _{j}.\end{aligned}}}
The Jacobian J is a function of constants, the independent variable and the parameters, 
so it changes from one iteration to the next. The residuals are given by
{\displaystyle r_{i}=y_{i}-f^{k}(x_{i},{\boldsymbol {\beta }})-\sum _{k=1}^{m}J_{ik}\,\Delta \beta _{k}=\Delta y_{i}-\sum _{j=1}^{m}J_{ij}\,\Delta \beta _{j}.}
To minimize the sum of squares of r_{i}, the gradient equation is set to zero and solved for
 \Delta \beta _{j}:
{\displaystyle -2\sum _{i=1}^{n}J_{ij}\left(\Delta y_{i}-\sum _{k=1}^{m}J_{ik}\,\Delta \beta _{k}\right)=0,}
which, on rearrangement, become m simultaneous linear equations, 
the normal equations:
{\displaystyle \sum _{i=1}^{n}\sum _{k=1}^{m}J_{ij}J_{ik}\,\Delta \beta _{k}=\sum _{i=1}^{n}J_{ij}\,\Delta y_{i}\qquad (j=1,\ldots ,m).}
The normal equations are written in matrix notation as
{\displaystyle \mathbf {(J^{T}J)\,\Delta {\boldsymbol {\beta }}=J^{T}\,\Delta y} .\,}
These are the defining equations of the Gauss–Newton algorithm.