Reputation:
I am curious as to why df[2]
is not supported, while df.ix[2]
and df[2:3]
both work.
In [26]: df.ix[2]
Out[26]:
A 1.027680
B 1.514210
C -1.466963
D -0.162339
Name: 2000-01-03 00:00:00
In [27]: df[2:3]
Out[27]:
A B C D
2000-01-03 1.02768 1.51421 -1.466963 -0.162339
I would expect df[2]
to work the same way as df[2:3]
to be consistent with Python indexing convention. Is there a design reason for not supporting indexing row by single integer?
Upvotes: 588
Views: 1356740
Reputation: 23111
If you want to index multiple rows by their integer indexes, use a list of indexes:
idx = [2,3,1]
df.iloc[idx]
N.B. If idx
is created using some rule, then you can also sort the dataframe by using .iloc
(or .loc
) because the output will be ordered by idx
. So in a sense, iloc
can act like a sorting function where idx
is the sorting key.
Upvotes: 5
Reputation: 133
I would normally go for .loc/.iloc
as suggested by Ted, but one may also select a row by transposing the DataFrame. To stay in the example above, df.T[2]
gives you row 2 of df
.
Upvotes: 4
Reputation: 128948
echoing @HYRY, see the new docs in 0.11
http://pandas.pydata.org/pandas-docs/stable/indexing.html
Here we have new operators, .iloc
to explicity support only integer indexing, and .loc
to explicity support only label indexing
e.g. imagine this scenario
In [1]: df = pd.DataFrame(np.random.rand(5,2),index=range(0,10,2),columns=list('AB'))
In [2]: df
Out[2]:
A B
0 1.068932 -0.794307
2 -0.470056 1.192211
4 -0.284561 0.756029
6 1.037563 -0.267820
8 -0.538478 -0.800654
In [5]: df.iloc[[2]]
Out[5]:
A B
4 -0.284561 0.756029
In [6]: df.loc[[2]]
Out[6]:
A B
2 -0.470056 1.192211
[]
slices the rows (by label location) only
Upvotes: 810
Reputation: 61967
[]
is to select columns.When the indexing operator is passed a string or integer, it attempts to find a column with that particular name and return it as a Series.
So, in the question above: df[2]
searches for a column name matching the integer value 2
. This column does not exist and a KeyError
is raised.
Strangely, when given a slice, the DataFrame indexing operator selects rows and can do so by integer location or by index label.
df[2:3]
This will slice beginning from the row with integer location 2 up to 3, exclusive of the last element. So, just a single row. The following selects rows beginning at integer location 6 up to but not including 20 by every third row.
df[6:20:3]
You can also use slices consisting of string labels if your DataFrame index has strings in it. For more details, see this solution on .iloc vs .loc.
I almost never use this slice notation with the indexing operator as its not explicit and hardly ever used. When slicing by rows, stick with .loc/.iloc
.
Upvotes: 123
Reputation: 439
you can loop through the data frame like this .
for ad in range(1,dataframe_c.size):
print(dataframe_c.values[ad])
Upvotes: 7
Reputation: 771
To index-based access to the pandas table, one can also consider numpy.as_array option to convert the table to Numpy array as
np_df = df.as_matrix()
and then
np_df[i]
would work.
Upvotes: 18
Reputation: 93774
You can take a look at the source code .
DataFrame
has a private function _slice()
to slice the DataFrame
, and it allows the parameter axis
to determine which axis to slice. The __getitem__()
for DataFrame
doesn't set the axis while invoking _slice()
. So the _slice()
slice it by default axis 0.
You can take a simple experiment, that might help you:
print df._slice(slice(0, 2))
print df._slice(slice(0, 2), 0)
print df._slice(slice(0, 2), 1)
Upvotes: 7
Reputation: 97291
You can think DataFrame as a dict of Series. df[key]
try to select the column index by key
and returns a Series object.
However slicing inside of [] slices the rows, because it's a very common operation.
You can read the document for detail:
http://pandas.pydata.org/pandas-docs/stable/indexing.html#basics
Upvotes: 34