Reputation: 331
I have a dataset which contains contributor's id and contributor_message. I wanted to retrieve all samples with the same message, say, contributor_message == 'I support this proposal because...'.
I use data.loc[data.contributor_message == 'I support this proposal because...'].index -> so basically you can get the index in the DataFrame with the same message, say those indices are 1, 2, 50, 9350, 30678,...
Then I tried data.iloc[[1,2,50]] and this gives me correct answer, i.e. the indices matches with the DataFrame indices.
However, when I use data.iloc[9350] or higher indices, I will NOT get the corresponding DataFrame index. Say I got 15047 in the DataFrame this time.
Can anyone advise how to fix this problem?
Upvotes: 4
Views: 3896
Reputation: 164623
This occurs when your indices are not aligned with their integer location.
Note that pd.DataFrame.loc
is used to slice by index and pd.DataFrame.iloc
is used to slice by integer location.
Below is a minimal example.
df = pd.DataFrame({'A': [1, 2, 1, 1, 5]}, index=[0, 1, 2, 4, 5])
idx = df[df['A'] == 1].index
print(idx) # Int64Index([0, 2, 4], dtype='int64')
res1 = df.loc[idx]
res2 = df.iloc[idx]
print(res1)
# A
# 0 1
# 2 1
# 4 1
print(res2)
# A
# 0 1
# 2 1
# 5 5
You have 2 options to resolve this problem.
Option 1
Use pd.DataFrame.loc
to slice by index, as above.
Option 2
Reset index and use pd.DataFrame.iloc
:
df = df.reset_index(drop=True)
idx = df[df['A'] == 1].index
res2 = df.iloc[idx]
print(res2)
# A
# 0 1
# 2 1
# 3 1
Upvotes: 7