Reputation: 429
I am really new in keras library and also Python. I am trying to import an excel file using pandas and convert it to a numpy.ndarray
using as_matrix()
function of pandas. But it seams to read my file wrong. Like I have a 90x1049 data set in Excel file. But when i am trying to convert it into numpy array it reads my data as 89x1049. I am using the following code, which is not working:
training_data_x = pd.read_excel("/home/workstation/ANN/new_input.xlsx")
X_train = training_data_x.as_matrix()
Upvotes: 3
Views: 48390
Reputation: 52949
Probably what happens is that your Excel file has no header row and so pandas.read_excel
consumes your first data row as such.
I tried creating an xlsx containing
1 2 3
2 3 4
3 4 5
4 5 6
5 6 7
6 7 8
7 8 9
8 9 10
9 10 11
10 11 12
Reading that resulted in
In [3]: df = pandas.read_excel('test.xlsx')
In [4]: df
Out[4]:
1 2 3
0 2 3 4
1 3 4 5
2 4 5 6
3 5 6 7
4 6 7 8
5 7 8 9
6 8 9 10
7 9 10 11
8 10 11 12
As can be seen, the first data row has been used as labels for columns.
To avoid consuming the first data row as headers, pass header=None
to read_excel
. Interestingly the documentation did not mention this usage before, but has been fixed since:
header : int, list of ints, default 0
Row (0-indexed) to use for the column labels of the parsed DataFrame. If a list of integers is passed those row positions will be combined into a
MultiIndex
. Use None if there are no headers.
Upvotes: 4
Reputation: 44545
If you have no header, try the following:
training_data = pd.read_excel("/home/workstation/ANN/new_input.xlsx", header=None)
X_train = training_data_x.as_matrix()
See also answers from a previous question.
Upvotes: 2