How are maximum likelihood techniques similar to OLS estimation?

Remember that we said that in doing linear regression (OLS), we pick a best-fitting line to describe our data? Our criterion for 'best-fitting line' was that line which minimized the sum of squared errors. By definition, the sum of squared errors is always zero.

In more complex models, we need a more complex definition of 'best-fitting line'. Stata knows what the definition should be, for probit and logit models, and so iterates through a maximization procedure to get as close as possible to this definition.

In words, you could think of the technique like this: if our assumption about the distribution of data is correct, then Stata will pick those values of bhat which make the observation of our sample MOST likely. This is why the technique is called maximum likelihood.

 

Back to Module 8