Reputation: 1207
Suppose simple data frame:
import pandas as pd
a = pd.DataFrame([[0,1], [2,3]])
I can slice this data frame very easily, first column is a[[0]]
, second is a[[1]]
.
Now, lets have more complex data frame. This is part of my code:
frame = pd.DataFrame(range(100), columns=["Variable"], index=["_".join(["loc", str(i)]) for i in range(1, 101)])
frame[1] = [i**3 for i in range(100)]
DataFrame frame
is also a pandas DataFrame. I can get the second column by frame[[1]]
. But when I try frame[[0]]
, I get an error:
Traceback (most recent call last):
File "<ipython-input-55-0c56ffb47d0d>", line 1, in <module>
frame[[0]]
File "C:\Users\Robert\Desktop\Záloha\WinPython-64bit-3.5.2.2\python- 3.5.2.amd64\lib\site-packages\pandas\core\frame.py", line 1991, in __getitem__
return self._getitem_array(key)
File "C:\Users\Robert\Desktop\Záloha\WinPython-64bit-3.5.2.2\python- 3.5.2.amd64\lib\site-packages\pandas\core\frame.py", line 2035, in _getitem_array
indexer = self.ix._convert_to_indexer(key, axis=1)
File "C:\Users\Robert\Desktop\Záloha\WinPython-64bit-3.5.2.2\python- 3.5.2.amd64\lib\site-packages\pandas\core\indexing.py", line 1184, in _convert_to_indexer
indexer = labels._convert_list_indexer(objarr, kind=self.name)
File "C:\Users\Robert\Desktop\Záloha\WinPython-64bit-3.5.2.2\python- 3.5.2.amd64\lib\site-packages\pandas\indexes\base.py", line 1112, in _convert_list_indexer
return maybe_convert_indices(indexer, len(self))
File "C:\Users\Robert\Desktop\Záloha\WinPython-64bit-3.5.2.2\python- 3.5.2.amd64\lib\site-packages\pandas\core\indexing.py", line 1856, in maybe_convert_indices
raise IndexError("indices are out-of-bounds")
IndexError: indices are out-of-bounds
I can still use frame.iloc[:,0]
but problem is that I don't understand why I can't use simple slicing by [[]]
? I use winpython spyder 3.
Upvotes: 18
Views: 83462
Reputation: 23449
[]
is a wrapper for __getitem__()
which selects by label and as @epattaro explained, there's no column label 0
in the dataframe created as in the OP. To select a column (or row) by position, the canonical way is via iloc
.
df.iloc[:, 0] # select first column as a Series
df.iloc[:, [0]] # select first column as a single column DataFrame
df.iloc[0] # select first row as a Series
df.iloc[[0]] # select first row as a single row DataFrame
Yet another method is take()
:
df.take([0], axis=1) # select first column
df.take([0]) # select first row
You can verify that for any df
, df.take([0], axis=1).equals(df.iloc[:, [0]])
returns True.
Upvotes: 1
Reputation: 2448
using your code:
import pandas as pd
var_vec = [i for i in range(100)]
num_of_sites = 100
row_names = ["_".join(["loc", str(i)]) for i in
range(1,num_of_sites + 1)]
frame = pd.DataFrame(var_vec, columns = ["Variable"], index = row_names)
spec_ab = [i**3 for i in range(100)]
frame[1] = spec_ab
if you ask to print out the 'frame' you get:
Variable 1
loc_1 0 0
loc_2 1 1
loc_3 2 8
loc_4 3 27
loc_5 4 64
loc_6 5 125
......
So the cause of your problem becomes obvious, you have no column called '0'. At line one you specify a lista called var_vec. At line 4 you make a dataframe out of that list, but you specify the index values and the column name (which is usually good practice). The numerical column name, '0', '1',.. as in the first example, only takes place when you dont specify the column name, its not a column position indexer.
If you want to access columns by their position, you can:
df[df.columns[0]]
what happens than, is you get the list of columns of the df, and you choose the term '0' and pass it to the df as a reference.
hope that helps you understand
edit:
another way (better) would be:
df.iloc[:,0]
where ":" stands for all rows. (also indexed by number from 0 to range of rows)
Upvotes: 19