HMLDude
HMLDude

Reputation: 1637

LDA Producing Fewer Components Than Requested in Python

I am working on the following data set:

http://archive.ics.uci.edu/ml/datasets/Bank+Marketing

The data can be found by clicking on the Data Folder link. There are two data sets present, a training and a testing set. The file I am using contains the combined data from both sets.

I am attempting to apply Linear Discriminant Analysis (LDA) to obtain two components, however when my code runs, it produces just a single component. I also obtain just a single component if I set "n_components = 3"

I just got done testing PCA, which works just fine for any number "n" I provide, such that "n" is less than or equal to the number of features present in the X arrays at the time of the transformation.

I am not sure why LDA seems to behaving so strangely. Here is my code:

#Load libraries
import pandas
import matplotlib.pyplot as plt
from sklearn import model_selection
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis

dataset = pandas.read_csv('bank-full.csv',engine="python", delimiter='\;')

#Output Basic Dataset Info
print(dataset.shape)
print(dataset.head(20))
print(dataset.describe())

# Split-out validation dataset
X = dataset.iloc[:,[0,5,9,11,12,13,14]] #we are selecting only the "clean data" w/o preprocessing
Y = dataset.iloc[:,16] 
validation_size = 0.20
seed = 7
X_train, X_validation, Y_train, Y_validation = model_selection.train_test_split(X, Y, test_size=validation_size, random_state=seed)

# Feature Scaling
from sklearn.preprocessing import StandardScaler
sc_X = StandardScaler()
X_train = sc_X.fit_transform(X_train)
X_temp = X_train
X_validation = sc_X.transform(X_validation)

'''# Applying PCA
from sklearn.decomposition import PCA
pca = PCA(n_components = 5)
X_train = pca.fit_transform(X_train)
X_validation = pca.transform(X_validation)
explained_variance = pca.explained_variance_ratio_'''

# Applying LDA
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis as LDA
lda = LDA(n_components = 2)
X_train = lda.fit_transform(X_train, Y_train)
X_validation = lda.transform(X_validation)

Upvotes: 0

Views: 1611

Answers (1)

alexisrozhkov
alexisrozhkov

Reputation: 1632

LDA (at least the implementation in sklearn) can produce at most k-1 components (where k is number of classes). So if you are dealing with binary classification - you'll end up with only 1 dimension.

Refer to manual for more detail: http://scikit-learn.org/stable/modules/generated/sklearn.discriminant_analysis.LinearDiscriminantAnalysis.html

Also related: Python (scikit learn) lda collapsing to single dimension

LDA ignoring n_components?

Upvotes: 4

Related Questions