PeterBe
PeterBe

Reputation: 840

What advantages does the iloc function have in pandas and Python

I just began to learn Python and Pandas and I saw in many tutorials the use of the iloc function. It is always stated that you can use this function to refer to columns and rows in a dataframe. However, you can also do this directly without the iloc function. So here is an example that yield the same output:

# features is just a dataframe with several rows and columns
features = pd.DataFrame(features_standardized)

y_train = features.iloc[start:end] [[1]]
y_train_noIloc = features [start:end] [[1]]

What is the difference between the two statements and what advantage do I have when using iloc? I'd appreicate every comment.

Upvotes: 3

Views: 1927

Answers (1)

s3dev
s3dev

Reputation: 9681

Per the pandas docs, iloc provides:

Purely integer-location based indexing for selection by position.

Therefore, as shown in the simplistic examples below, [row, col] indexing is not possible without using loc or iloc, as a KeyError will be thrown.

Example:

# Build a simple, sample DataFrame.
df = pd.DataFrame({'a': [1, 2, 3, 4]})

# No iloc
>>> df[0, 0]
KeyError: (0, 0)

# With iloc:
>>> df.iloc[0, 0]
1

The same logic holds true when using loc and a column name.

What is the difference and when does the indexing work without iloc?

The short answer:
Use loc and/or iloc when indexing rows and columns. If indexing on row or column, you can get away without it, and is referred to as 'slicing'.

However, I see in your example [start:end][[1]] has been used. It is generaly considered bad practice to have back-to-back square brackets in pandas, (e.g.: [][]), and generally an indication that a different (more efficient) approach should be taken - in this case, using iloc.

The longer answer:
Adapting your [start:end] slicing example (shown below), indexing works without iloc when indexing (slicing) on row only. The following example does not use iloc and will return rows 0 through 3.

df[0:3]

Output:

   a
0  1
1  2
2  3

Note the difference in [0:3] and [0, 3]. The former (slicing) uses a colon and will return rows or indexes 0 through 3. Whereas the latter uses a comma, and is a [row, col] indexer, which requires the use of iloc.

Aside:
The two methods can be combined as show here, and will return rows 0 through 3, for column index 0. Whereas this is not possible without the use of iloc.

df.iloc[0:3, 0]

Upvotes: 2

Related Questions