Why do loc and iloc work differently for slicing rows of a pandas DataFrame?

Question

I want a DataFrame where the top rows of one column (called 'cat') have value "LOW", the mid and bottom parts of the frame will have values "MID" and "HI". So, for a frame of 1,200 rows, the value counts for the cat columns should result in:

LOW    400
MID    400
HI     400

This should be easy. But, apparently it is not really. To no avail I tried to select and change the bottom rows using df.loc[-400:,["cat"]] = "HI"

But, this approach does work for the top-rows: df.loc[:399,["cat"]] = "LOW"

The sample below shows a working example, and note that it requires both loc and iloc. Is this where pandas can improve?

import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.random([1200, 4]), columns=['A', 'B', 'C', 'D'])
df["cat"] = "MID"
df.loc[:399,["cat"]] = "LOW"
df.iloc[-400:,-1] = "HI"  # The -1 selects the last column ('cat') - not ideal.
df.cat.value_counts()

jezrael · Accepted Answer

Use get_loc for position of column cat if want select by positions by iloc - need positions of index and columns:

df = pd.DataFrame(np.random.random([1200, 4]), columns=['A', 'B', 'C', 'D'])
df["cat"] = "MID"

df.iloc[:400,df.columns.get_loc('cat')] = "LOW"
df.iloc[-400:,df.columns.get_loc('cat')] = "HI"

Detail:

print (df.columns.get_loc('cat'))
4

Alternative is use loc for select by labels - then need select 400 values of index by indexing:

df.loc[df.index[:400],"cat"] = "LOW"
df.loc[df.index[-400:],"cat"] = "HI"

a = df.cat.value_counts()
print (a)
MID    400
HI     400
LOW    400
Name: cat, dtype: int64

Another ways for set 400 values use numpy.repeat or set values by repeat of lists:

df["cat"] =  np.array(["LOW", "MID", "HI"]).repeat(400)

df["cat"] =  ["LOW"] * 400 + ["MID"] * 400 +  ["HI"] * 400
#thanks  @Quickbeam2k1
df = df.assign(cat = ['LOW']*400 + ['MID']*400 + ['HIGH']*400 )

Why do loc and iloc work differently for slicing rows of a pandas DataFrame?

Answers (2)

Related Questions