Reputation: 11
I am teaching myself about Pandas DFs and I experienced an error that I find perplexing regarding why it's happening. I tried setting up two nearly identical data frame objects, adding one additional column with an equivalent number of rows as the other entries, but an error is thrown.
Any ideas on why that would happen? I do have an alternate method for instantiating both using a dictionary which contains the keys with the corresponding list of values. This method works regardless of the amount of data or columns that I use. I'm just interested in why such a simple change results in a failure.
Printing df1 works as expected, but attempting to add the code for df2, whether the code for df1 is included in the file or not, results in the traceback error that I've listed.
I'm using Python 3.6.5 and Pandas 0.23.1
df1 = pd.DataFrame(np.array([[1, 2, 3],[4, 5, 6],[7, 8, 9]]), columns=['Column_1','Column_2','Column_3'])
df2 = pd.DataFrame(np.array([[1, 2, 3],[4, 5, 6],[7, 8, 9],[10, 11, 12]]), columns=['Column_1','Column_2','Column_3','Column_4'])
I've listed the error information I receive when instantiating df2.
Traceback (most recent call last):
File "C:\Program Files\Python36-32\lib\site-packages\pandas\core\internals.py", line 4857, in create_block_manager_from_blocks
placement=slice(0, len(axes[0])))]
File "C:\Program Files\Python36-32\lib\site-packages\pandas\core\internals.py", line 3205, in make_block
return klass(values, ndim=ndim, placement=placement)
File "C:\Program Files\Python36-32\lib\site-packages\pandas\core\internals.py", line 125, in __init__
'{mgr}'.format(val=len(self.values), mgr=len(self.mgr_locs)))
ValueError: Wrong number of items passed 3, placement implies 4
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Users\Steven\Documents\Python programs\temp.py", line 95, in <module>
columns=['Column_1','Column_2','Column_3','Column_4'])
File "C:\Program Files\Python36-32\lib\site-packages\pandas\core\frame.py", line 379, in __init__
copy=copy)
File "C:\Program Files\Python36-32\lib\site-packages\pandas\core\frame.py", line 536, in _init_ndarray
return create_block_manager_from_blocks([values], [columns, index])
File "C:\Program Files\Python36-32\lib\site-packages\pandas\core\internals.py", line 4866, in create_block_manager_from_blocks
construction_error(tot_items, blocks[0].shape[1:], axes, e)
File "C:\Program Files\Python36-32\lib\site-packages\pandas\core\internals.py", line 4843, in construction_error
passed, implied))
ValueError: Shape of passed values is (3, 4), indices imply (4, 4)
Upvotes: 0
Views: 10185
Reputation: 138
You are actually adding rows instead of columns. Thus you are constructing a DataFrame of shape (4,3) instead of (3,4) and trying to assign 4 column names to just 3 columns, hence the error.
pd.DataFrame(np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]), columns=['Column_1', 'Column_2', 'Column_3'])
Out[73]:
Column_1 Column_2 Column_3
0 1 2 3
1 4 5 6
2 7 8 9
pd.DataFrame(np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12]]))
Out[74]:
0 1 2
0 1 2 3
1 4 5 6
2 7 8 9
3 10 11 12
You need to pass your arguments in a different way or pass your column names as index and then transpose the dataframe.
pd.DataFrame(np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12]]), index=['Column_1', 'Column_2', 'Column_3', 'Column_4']).T
Out[75]:
Column_1 Column_2 Column_3 Column_4
0 1 4 7 10
1 2 5 8 11
2 3 6 9 12
Upvotes: 2