Rian Zaman
Rian Zaman

Reputation: 429

How to convert an excel file data into numpy array using pandas?

I am really new in keras library and also Python. I am trying to import an excel file using pandas and convert it to a numpy.ndarray using as_matrix() function of pandas. But it seams to read my file wrong. Like I have a 90x1049 data set in Excel file. But when i am trying to convert it into numpy array it reads my data as 89x1049. I am using the following code, which is not working:

training_data_x = pd.read_excel("/home/workstation/ANN/new_input.xlsx")
X_train = training_data_x.as_matrix()

Upvotes: 3

Views: 48390

Answers (2)

Ilja Everilä
Ilja Everilä

Reputation: 52949

Probably what happens is that your Excel file has no header row and so pandas.read_excel consumes your first data row as such.

I tried creating an xlsx containing

1   2   3
2   3   4
3   4   5
4   5   6
5   6   7
6   7   8
7   8   9
8   9   10
9   10  11
10  11  12

Reading that resulted in

In [3]: df = pandas.read_excel('test.xlsx')

In [4]: df
Out[4]: 
    1   2   3
0   2   3   4
1   3   4   5
2   4   5   6
3   5   6   7
4   6   7   8
5   7   8   9
6   8   9  10
7   9  10  11
8  10  11  12

As can be seen, the first data row has been used as labels for columns.

To avoid consuming the first data row as headers, pass header=None to read_excel. Interestingly the documentation did not mention this usage before, but has been fixed since:

header : int, list of ints, default 0

Row (0-indexed) to use for the column labels of the parsed DataFrame. If a list of integers is passed those row positions will be combined into a MultiIndex. Use None if there are no headers.

Upvotes: 4

pylang
pylang

Reputation: 44545

If you have no header, try the following:

training_data = pd.read_excel("/home/workstation/ANN/new_input.xlsx", header=None)

X_train = training_data_x.as_matrix()

See also answers from a previous question.

Upvotes: 2

Related Questions