Reputation: 3919
I am trying to encode some information to read into a Machine Learning model using the following
import numpy as np
import pandas as pd
import matplotlib.pyplot as py
Dataset = pd.read_csv('filename.csv', sep = ',')
X = Dataset.iloc[:,:-1].values
Y = Dataset.iloc[:,18].values
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
labelencoder_X = LabelEncoder()
X[:, 0] = labelencoder_X.fit_transform(X[:, 0])
onehotencoder = OneHotEncoder(categorical_features = [0])
X = onehotencoder.fit_transform(X).toarray()
however I am getting an error that reads
IndexError: single positional indexer is out-of-bounds
Upvotes: 115
Views: 580676
Reputation: 23041
To avoid this error altogether, you can use -1 to index the last column:
X = Dataset.iloc[:,:-1].values # <--- all but the last column
Y = Dataset.iloc[:,-1].values # <--- the last column
or index the last column and select it:
X = Dataset[Dataset.columns[:-1]].values # <--- all but the last column
Y = Dataset[Dataset.columns[-1]].values # <--- the last column
Another way is to pop the last column from the dataframe; pop()
removes a column out of the dataframe it's called on.
X = Dataset.copy() # <--- copy Dataset into X
Y = X.pop(X.columns[-1]) # <--- pop last column out of X
Upvotes: 0
Reputation: 330
Mostly other members did explained the issue and the solution as this code block is trying to access the element which is not exist.
However, for sanity before applying iloc, make sure the dataframe indexes are in proper order. For this you can simply execute this,
df.reset_index(inplace=True)
(Or)
df = df.reset_index()
Upvotes: 3
Reputation: 151
This issue is seen when the input file (csv in this case) itself is empty somehow. Again it boils down to the same issue where it is trying to access the data from a column that doesn't exist.
Upvotes: 0
Reputation: 9522
It does not help for the solution of the question here, but whoever might come here for the error and not for the example, I had this error IndexError: single positional indexer is out-of-bounds
when I tried to find a row in a dataframe2 while looping over the rows of dataframe1, using many criteria in the filter of dataframe2, and adding each found row to a new empty dataframe3 (Do not ask me why!). One of the values in the row was a "nan" value both in dataframe1 and dataframe2. I could not filter anymore nor add a new row.
Solution:
dataframe1.fillna("nan") # or whatever you want as a fill value
dataframe2.fillna("nan")
and the script ran through without the error.
Upvotes: 5
Reputation: 36594
This happens when you index a row/column with a number that is larger than the dimensions of your dataframe
. For instance, getting the eleventh column when you have only three.
import pandas as pd
df = pd.DataFrame({'Name': ['Mark', 'Laura', 'Adam', 'Roger', 'Anna'],
'City': ['Lisbon', 'Montreal', 'Lisbon', 'Berlin', 'Glasgow'],
'Car': ['Tesla', 'Audi', 'Porsche', 'Ford', 'Honda']})
You have 5 rows and three columns:
Name City Car
0 Mark Lisbon Tesla
1 Laura Montreal Audi
2 Adam Lisbon Porsche
3 Roger Berlin Ford
4 Anna Glasgow Honda
Let's try to index the eleventh column (it doesn't exist):
df.iloc[:, 10] # there is obviously no 11th column
IndexError: single positional indexer is out-of-bounds
If you are a beginner with Python, remember that df.iloc[:, 10]
would refer to the eleventh column.
Upvotes: 33
Reputation: 1710
This error is caused by:
Y = Dataset.iloc[:,18].values
Indexing is out of bounds here most probably because there are less than 19 columns in your Dataset, so column 18 does not exist. The following code you provided doesn't use Y at all, so you can just comment out this line for now.
Upvotes: 128