Linear Discriminant Analysis

Using Linear Discriminant Analysis (LDA) for dimension reduction

5 min readJan 30, 2019

Introduction

Linear Discriminant Analysis(LDA) is a dimensionality reduction technique most commonly used in pre-processing step of machine learning and pattern classification applications. The objective is to project the data onto a lower-dimensional space with good class-separability in order avoid overfitting (“curse of dimensionality”) and also reduce computational costs. This helps in reducing the dimension of the data and at the same time tries to preserve the essential class specific characteristics of the data.

How does LDA work?

Linear Discriminant Analysis is a supervised algorithm that takes into the account the labelled data while carrying out dimensionality reduction method. This technique embarks upon to find a new feature space that maximizes the class separability by using an approach very similar to the one used in Principal Component Analysis (PCA).

PCA is a statistical procedure that converts a set of possibly correlated variables into a set of linearly uncorrelated features called principal components. It essentially drops the least important variables while retaining the valuable ones by finding principal component axes along which the variance of data is high.

In LDA, we intend to maximize the separation between the classes by maximizing the distance between the centroids of the classes and at the same time minimize the within-class variance so that well separated non-overlapping clusters are formed. Minimizing the within-class variance leads to the creation of compact, less spread-ed classes. These functions are called as discriminant functions.

The scatter between class variability can be defined as

and the within class separation can be defined by

Discrimination Rules :

Maximum likelihood: Assigns x to the group that maximizes population (group) density.
Bayes Discriminant Rule: Assigns x to the group that maximizes the product of prior probability and population density.
Fisher’s linear discriminant rule: Maximizes the ratio between between-class and within-class, and finds a linear combination of the predictors to predict group.

Applying LDA on IRIS dataset

Importing IRIS dataset :

library(datasets)
head(iris)#outputSepal.Length Sepal.Width Petal.Length Petal.Width Species
1          5.1         3.5          1.4         0.2  setosa
2          4.9         3.0          1.4         0.2  setosa
3          4.7         3.2          1.3         0.2  setosa
4          4.6         3.1          1.5         0.2  setosa
5          5.0         3.6          1.4         0.2  setosa
6          5.4         3.9          1.7         0.4  setosa

2. Descriptive Statistics

summary(iris)#output Sepal.Length    Sepal.Width     Petal.Length    Petal.Width           
 Min.   :4.300   Min.   :2.000   Min.   :1.000   Min.   :0.100   
 1st Qu.:5.100   1st Qu.:2.800   1st Qu.:1.600   1st Qu.:0.300   
 Median :5.800   Median :3.000   Median :4.350   Median :1.300    
 Mean   :5.843   Mean   :3.057   Mean   :3.758   Mean   :1.199                  
 3rd Qu.:6.400   3rd Qu.:3.300   3rd Qu.:5.100   3rd Qu.:1.800                  
 Max.   :7.900   Max.   :4.400   Max.   :6.900   Max.   :2.500

3. Exploratory Data Analysis

# Visualizing correlation between features
plot(iris)

# Density & Frequency analysis with the Histogram
# We can observe that most of the attributes exhibit normal    distribution# Sepal length 
ggplot(data=iris, aes(x=Sepal.Length))+
  geom_histogram(color="black", aes(fill=Species)) + 
  xlab("Sepal Length (cm)") +  
  ylab("Frequency") + 
  theme(legend.position="none")+
  ggtitle("Histogram of Sepal Length")+
  geom_vline(data=iris, aes(xintercept = mean(Sepal.Length)))Similarly other features can be plotted

# Now we will plot boxplots for the features and try to identify any # outliers present in the data. The dots in the boxplot indicate the potential outliers which maybe present in the data. Not necessarily all of them are outliers, so human verification for such points is essential.# For Sepal Length
ggplot(iris, aes(Species, Sepal.Length, fill=Species)) + 
  geom_boxplot()+
  scale_y_continuous("Petal Length (cm)") +
  labs(title = "Iris Petal Length Box Plot", x = "Species")

# Let's plot a scatter plot to visualize the relation between features. We can see a strong relation between petal width and length as opposed to sepal measurements, implying the petal measurements play a crucial role in clustering.

# Building the LDA model of iris using Training Set.
iris.lda <- lda(TrainingSet$Species ~., data =TrainingSet)
predict.iris <- predict(iris.lda)
table(TrainingSet$Species, predict.iris$class)# Predict the test set using the LDA model
predictLDATest <- predict(iris.lda, newdata = TestSet)
table(TestSet$Species, predictLDATest$class)# Output# Coefficients of linear discriminants:                    LD1         LD2
Sepal.Length  0.8293776  0.02410215
Sepal.Width   1.5344731  2.16452123
Petal.Length -2.2012117 -0.93192121
Petal.Width  -2.8104603  2.83918785# Confusion Matrix of training set               setosa  versicolor virginica
  setosa         34          0         0
  versicolor      0         34         0
  virginica       0          1        31# Confusion Matrix of test set                setosa versicolor virginica
  setosa         16          0         0
  versicolor      0         15         1
  virginica       0          0        18So from the results we can see that the miss classification rate is as less as 2% .

Thus, by applying LDA we could successfully map the data to a lower dimensional space without losing much of its original characteristics.

References