Is LDA a classifier

Linear Discriminant Analysis - Introduction

Author: Hans Lohninger

The linear discriminant analysis (engl. linear discriminant analysis, LDA) is a method that can be used to distinguish between two or more sample groups. In order to develop a classifier based on LDA, the following steps must be carried out:

Definition of the groups

Determination of the discriminant function

Estimation of the discriminant function

Testing the discriminant function

application

Definition of the groups:

The groups to be differentiated can result either from the problem investigated or from previous analyzes, such as the cluster analysis. The number of groups is not necessarily limited to two, although distinguishing between two groups is the most common method. Note that the number of groups must not exceed the number of variables. Another requirement is that the groups have the same covariance structure (i.e. they must be comparable).

Determination of the discriminant function:

In principle, any mathematical function can be used as a discriminant function. In the case of the LDA it becomes a linear function of form

with xi used as descriptive variables. The parameters ai must be determined in such a way that the separation of the groups is optimal. Note that the linear discriminant function is formally similar to multiple linear regression. In fact, one can apply the MLR directly if the dependent variable y is replaced by the weighted class numbers c1 and c2 is replaced.

c1 = n2/ (n1+ n2) and c2 = - n1/ (n1+ n2)

To get a better understanding of how the discriminant function works, you should start this interactive example.

Estimation of the parameters of the discriminant function:

As you can see in the interactive example, there is only one direction of the discriminant function that gives the best separation results. The determination of the coefficients of the discriminant function is very simple. In principle, the discriminant function is formed in such a way that the separation (= distance) between the groups is maximum and the distance within the groups is minimal.

Testing the discriminant function:

After the discriminant function has been parameterized, it must be tested either by an independent set of test data or by cross-validation. In both cases, the test result should be comparable to the training data.

Application:

Discriminant analysis can be used either for analysis or for classification:

  • Analysis: How can the material be interpreted? Which variables contribute most to the difference?
  • Classification: Assuming a discriminant function can be found that gives a satisfactory separation, this function can be used to classify data.