Reputation: 1
I have the following code for training IDS
# Importing the KDCup99 dataset
dataset = pd.read_csv(r'C:\Users\Ahmad\Desktop\Thiese\KDDCup99.csv',on_bad_lines='skip')
X = dataset.iloc[:, :-1].values
y = dataset.iloc[:, 41:42].values
# Spliting the dataset into training and test sets
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3, random_state = 0)
''' Data Preprocessing '''
# Applying ColumnTransformer to the categorical columns of X_train and X_test
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder
ct = ColumnTransformer(transformers = [('encoder', OneHotEncoder(), [1, 2, 3])], remainder = 'passthrough')
X_train = ct.fit_transform(X_train)
But the result keeps showing the following error
IndexError: index 1 is out of bounds for axis 0 with size 0
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File ~\untitled0.py:29 in <module>
X_train = ct.fit_transform(X_train)
File ~\anaconda3\lib\site-packages\sklearn\compose\_column_transformer.py:687 in fit_transform
self._validate_column_callables(X)
File ~\anaconda3\lib\site-packages\sklearn\compose\_column_transformer.py:374 in _validate_column_callables
transformer_to_input_indices[name] = _get_column_indices(X, columns)
File ~\anaconda3\lib\site-packages\sklearn\utils\__init__.py:384 in _get_column_indices
raise ValueError(
ValueError: all features must be in [0, -1] or [-0, 0]
What is the problem here?
Upvotes: 0
Views: 670
Reputation: 140
Problem here is in the filename.
To reproduce I followed http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html and downloaded dataset there, changing your first line to
dataset = pd.read_csv(r'kddcup.data_corrected',on_bad_lines='skip')
and executing the rest of your code gives no errors. Caution the file is csv but has no extension '.csv' ender.
To avoid such problems further you can check that data was read correctly with
dataset.shape
providing its output is useful in such error.
Upvotes: 0