Asif.Khan
Asif.Khan

Reputation: 147

"IndexError: too many indices for array" appearing when creating a train test split

I am creating a Neural Network and currently I am working on the; train, test split section using:

import csv
import math
import numpy as np 
import pandas as pd
from keras.models import Sequential
from keras.layers import Dense
import datetime
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split

X1 = Values[1:16801] #16,800 values

train_size = int(len(X1) * 0.67)
test_size = len(X1) - train_size

train, test = X1[0:train_size,:], X1[train_size:len(X1),:]
print(len(train), len(test))

I have 16,800 values for X1 which look like:

[0.03454225 0.02062136 0.00186715 ... 0.92857565 0.64930691 0.20325924]

My traceback error message is:

IndexError                                Traceback (most recent call last)
<ipython-input-16-8cadae12af2c> in <module>()
     69 test_size = len(X1) - train_size
     70 
---> 71 train, test = X1[0:train_size,:], X1[train_size:len(X1),:]
     72 print(len(train), len(test))
     73 

IndexError: too many indices for array

I'm not sure why this could be,If anyone can help, it would be really appreciated.

Upvotes: 1

Views: 242

Answers (1)

Kumar
Kumar

Reputation: 776

The error is because of the below line:

train, test = X1[0:train_size,:], X1[train_size:len(X1),:]

Here, your data is a one-dimensional array as is evident from the data posted in your question. But in the above line, you are using subscript for 2nd dimension. In the above line 0:train_size will select indices 0 to train_size on 1st dimension and : will select all the indices on 2nd dimension. But you don't have a 2nd dimension in your data. Below code should work.

import csv
import math
import numpy as np 
import pandas as pd
from keras.models import Sequential
from keras.layers import Dense
import datetime
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split

# 1D array
X1 = Values[1:16801] #16,800 values

train_size = int(len(X1) * 0.67)
test_size = len(X1) - train_size

train, test = X1[0:train_size], X1[train_size:len(X1)]
print(len(train), len(test))

Upvotes: 2

Related Questions