Reputation: 840
I just began to learn Python and Pandas and I saw in many tutorials the use of the iloc function. It is always stated that you can use this function to refer to columns and rows in a dataframe. However, you can also do this directly without the iloc function. So here is an example that yield the same output:
# features is just a dataframe with several rows and columns
features = pd.DataFrame(features_standardized)
y_train = features.iloc[start:end] [[1]]
y_train_noIloc = features [start:end] [[1]]
What is the difference between the two statements and what advantage do I have when using iloc? I'd appreicate every comment.
Upvotes: 3
Views: 1927
Reputation: 9681
Per the pandas docs, iloc
provides:
Purely integer-location based indexing for selection by position.
Therefore, as shown in the simplistic examples below, [row, col]
indexing is not possible without using loc
or iloc
, as a KeyError
will be thrown.
Example:
# Build a simple, sample DataFrame.
df = pd.DataFrame({'a': [1, 2, 3, 4]})
# No iloc
>>> df[0, 0]
KeyError: (0, 0)
# With iloc:
>>> df.iloc[0, 0]
1
The same logic holds true when using loc
and a column name.
The short answer:
Use loc
and/or iloc
when indexing rows and columns. If indexing on row or column, you can get away without it, and is referred to as 'slicing'.
However, I see in your example [start:end][[1]]
has been used. It is generaly considered bad practice to have back-to-back square brackets in pandas, (e.g.: [][]
), and generally an indication that a different (more efficient) approach should be taken - in this case, using iloc
.
The longer answer:
Adapting your [start:end]
slicing example (shown below), indexing works without iloc
when indexing (slicing) on row only. The following example does not use iloc
and will return rows 0 through 3.
df[0:3]
Output:
a
0 1
1 2
2 3
Note the difference in [0:3]
and [0, 3]
. The former (slicing) uses a colon and will return rows or indexes 0 through 3. Whereas the latter uses a comma, and is a [row, col]
indexer, which requires the use of iloc
.
Aside:
The two methods can be combined as show here, and will return rows 0 through 3, for column index 0. Whereas this is not possible without the use of iloc
.
df.iloc[0:3, 0]
Upvotes: 2