RyanKao
RyanKao

Reputation: 331

Pandas: using iloc to retrieve data does not match input index

I have a dataset which contains contributor's id and contributor_message. I wanted to retrieve all samples with the same message, say, contributor_message == 'I support this proposal because...'.

I use data.loc[data.contributor_message == 'I support this proposal because...'].index -> so basically you can get the index in the DataFrame with the same message, say those indices are 1, 2, 50, 9350, 30678,...

Then I tried data.iloc[[1,2,50]] and this gives me correct answer, i.e. the indices matches with the DataFrame indices.

However, when I use data.iloc[9350] or higher indices, I will NOT get the corresponding DataFrame index. Say I got 15047 in the DataFrame this time.

Can anyone advise how to fix this problem?

Upvotes: 4

Views: 3896

Answers (1)

jpp
jpp

Reputation: 164623

This occurs when your indices are not aligned with their integer location.

Note that pd.DataFrame.loc is used to slice by index and pd.DataFrame.iloc is used to slice by integer location.

Below is a minimal example.

df = pd.DataFrame({'A': [1, 2, 1, 1, 5]}, index=[0, 1, 2, 4, 5])

idx = df[df['A'] == 1].index

print(idx)  # Int64Index([0, 2, 4], dtype='int64')

res1 = df.loc[idx]
res2 = df.iloc[idx]

print(res1)
#    A
# 0  1
# 2  1
# 4  1

print(res2)
#    A
# 0  1
# 2  1
# 5  5

You have 2 options to resolve this problem.

Option 1

Use pd.DataFrame.loc to slice by index, as above.

Option 2

Reset index and use pd.DataFrame.iloc:

df = df.reset_index(drop=True)
idx = df[df['A'] == 1].index

res2 = df.iloc[idx]

print(res2)
#    A
# 0  1
# 2  1
# 3  1

Upvotes: 7

Related Questions