Linear Discriminant Analysis
Using Linear Discriminant Analysis (LDA) for dimension reduction
Introduction
Linear Discriminant Analysis(LDA) is a dimensionality reduction technique most commonly used in pre-processing step of machine learning and pattern classification applications. The objective is to project the data onto a lower-dimensional space with good class-separability in order avoid overfitting (“curse of dimensionality”) and also reduce computational costs. This helps in reducing the dimension of the data and at the same time tries to preserve the essential class specific characteristics of the data.
How does LDA work?
Linear Discriminant Analysis is a supervised algorithm that takes into the account the labelled data while carrying out dimensionality reduction method. This technique embarks upon to find a new feature space that maximizes the class separability by using an approach very similar to the one used in Principal Component Analysis (PCA).
PCA is a statistical procedure that converts a set of possibly correlated variables into a set of linearly uncorrelated features called principal components. It essentially drops the least important variables while retaining the valuable ones by finding principal component axes along which the variance of data is high.
In LDA, we intend to maximize the separation between the classes by maximizing the distance between the centroids of the classes and at the same time minimize the within-class variance so that well separated non-overlapping clusters are formed. Minimizing the within-class variance leads to the creation of compact, less spread-ed classes. These functions are called as discriminant functions.
The scatter between class variability can be defined as
and the within class separation can be defined by
Discrimination Rules :
- Maximum likelihood: Assigns x to the group that maximizes population (group) density.
- Bayes Discriminant Rule: Assigns x to the group that maximizes the product of prior probability and population density.
- Fisher’s linear discriminant rule: Maximizes the ratio between between-class and within-class, and finds a linear combination of the predictors to predict group.
Applying LDA on IRIS dataset
- Importing IRIS dataset :
library(datasets)
head(iris)#outputSepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
2. Descriptive Statistics
summary(iris)#output Sepal.Length Sepal.Width Petal.Length Petal.Width
Min. :4.300 Min. :2.000 Min. :1.000 Min. :0.100
1st Qu.:5.100 1st Qu.:2.800 1st Qu.:1.600 1st Qu.:0.300
Median :5.800 Median :3.000 Median :4.350 Median :1.300
Mean :5.843 Mean :3.057 Mean :3.758 Mean :1.199
3rd Qu.:6.400 3rd Qu.:3.300 3rd Qu.:5.100 3rd Qu.:1.800
Max. :7.900 Max. :4.400 Max. :6.900 Max. :2.500
3. Exploratory Data Analysis
# Visualizing correlation between features
plot(iris)
# Density & Frequency analysis with the Histogram
# We can observe that most of the attributes exhibit normal distribution# Sepal length
ggplot(data=iris, aes(x=Sepal.Length))+
geom_histogram(color="black", aes(fill=Species)) +
xlab("Sepal Length (cm)") +
ylab("Frequency") +
theme(legend.position="none")+
ggtitle("Histogram of Sepal Length")+
geom_vline(data=iris, aes(xintercept = mean(Sepal.Length)))Similarly other features can be plotted
# Now we will plot boxplots for the features and try to identify any # outliers present in the data. The dots in the boxplot indicate the potential outliers which maybe present in the data. Not necessarily all of them are outliers, so human verification for such points is essential.# For Sepal Length
ggplot(iris, aes(Species, Sepal.Length, fill=Species)) +
geom_boxplot()+
scale_y_continuous("Petal Length (cm)") +
labs(title = "Iris Petal Length Box Plot", x = "Species")
# Let's plot a scatter plot to visualize the relation between features. We can see a strong relation between petal width and length as opposed to sepal measurements, implying the petal measurements play a crucial role in clustering.
# Building the LDA model of iris using Training Set.
iris.lda <- lda(TrainingSet$Species ~., data =TrainingSet)
predict.iris <- predict(iris.lda)
table(TrainingSet$Species, predict.iris$class)# Predict the test set using the LDA model
predictLDATest <- predict(iris.lda, newdata = TestSet)
table(TestSet$Species, predictLDATest$class)# Output# Coefficients of linear discriminants: LD1 LD2
Sepal.Length 0.8293776 0.02410215
Sepal.Width 1.5344731 2.16452123
Petal.Length -2.2012117 -0.93192121
Petal.Width -2.8104603 2.83918785# Confusion Matrix of training set setosa versicolor virginica
setosa 34 0 0
versicolor 0 34 0
virginica 0 1 31# Confusion Matrix of test set setosa versicolor virginica
setosa 16 0 0
versicolor 0 15 1
virginica 0 0 18So from the results we can see that the miss classification rate is as less as 2% .
Thus, by applying LDA we could successfully map the data to a lower dimensional space without losing much of its original characteristics.