Reputation: 3213
I have the following numpy
array:
numpy_x.shape
(9982, 26)
numpy_x
have 9982 records/observations and 26 columns index. Is this right really?
numpy_x[:]
array([[0.00000000e+00, 9.60000000e-01, 1.00000000e+00, ...,
1.20000000e+00, 6.90000000e-01, 1.17000000e+00],
[1.00000000e+00, 9.60000000e-01, 1.00000000e+00, ...,
1.20000000e+00, 7.00000000e-01, 1.17000000e+00],
[2.00000000e+00, 9.60000000e-01, 1.00000000e+00, ...,
1.20000000e+00, 7.00000000e-01, 1.17000000e+00],
...,
[9.97900000e+03, 6.10920994e-01, 7.58135980e-01, ...,
1.08704204e+00, 7.88187535e-01, 1.23021669e+00],
[9.98000000e+03, 6.10920994e-01, 7.58135980e-01, ...,
1.08704204e+00, 7.88187535e-01, 1.23021669e+00],
[9.98100000e+03, 6.10920994e-01, 7.58135980e-01, ...,
1.08704204e+00, 7.88187535e-01, 1.23021669e+00]])
I want generate a dataframe with numpy_x data, index and columns (index and columns are the same really?), then I proceed to perform the following:
import pandas as pd
pd.DataFrame(data=numpy_x[:], # I want pass the entire numpy array content
index=numpy_x[1:26],
columns=numpy_x[9982:26])
But I get the following error:
/.conda/envs/x/lib/python3.6/site-packages/pandas/core/internals.py in construction_error(tot_items, block_shape, axes, e)
4606 raise ValueError("Empty data passed with indices specified.")
4607 raise ValueError("Shape of passed values is {0}, indices imply {1}".format(
-> 4608 passed, implied))
4609
4610
ValueError: Shape of passed values is (26, 9982), indices imply (0, 25)
How to can I understand what parameters pass on index
and columns
attributes?
Upvotes: 0
Views: 811
Reputation: 9081
Use -
numpy_x=np.random.random((100,10))
df=pd.DataFrame(numpy_x)
Output
0 1 2 3 4 5 6 \
0 0.204839 0.837503 0.696896 0.235414 0.594766 0.521302 0.841167
1 0.041490 0.679537 0.657314 0.656672 0.524983 0.936918 0.482802
2 0.318928 0.423196 0.218037 0.515017 0.107851 0.564404 0.218297
3 0.644913 0.433771 0.297033 0.011239 0.346021 0.353749 0.587631
4 0.127949 0.517230 0.969399 0.743442 0.268566 0.415327 0.567572
7 8 9
0 0.882685 0.211414 0.659820
1 0.752496 0.047198 0.775250
2 0.521580 0.655942 0.178753
3 0.123761 0.483601 0.157191
4 0.849218 0.098588 0.754402
I want generate a dataframe with numpy_x data, index and columns (index and columns are the same really?)
Yes and no. Index
is simply the axis labelling information in pandas
. Depending upon the axis, Index can either mean row indexing or column indexing.
The axis labeling information in pandas objects serves many purposes:
It can also be a simple single integer index or it can also be Multi-Index
Index
and Columns
Parameter
The columns
parameter is simply the column labels that you want to provide to your dataset, in this case you want to pass 26 names for the 26 columns in your numpy
array. This will default to np.arange(n)
if no column labels are provided
The index
parameter is simply the Index to use for the resulting frame. Will default to np.arange(n)
if no indexing information part of input data and no index provided (which is what is the case in my example)
Upvotes: 1