Sledge
Sledge

Reputation: 1345

Question about drop=True in pd.dataframe.reset_index()

In a Pandas dataframe, it's possible to reset the index using the reset_index() method. One optional argument is drop=True which according to the documentation:

drop : bool, default False
    Do not try to insert index into dataframe columns. 
    This resets the index to the default integer index.

My question is, what does the first sentence mean? Will it try to convert an integer index to a new column in my df if I leave if False?

Also, will my row order be preserved or should I also sort to ensure proper ordering?

Upvotes: 3

Views: 6304

Answers (1)

Cohan
Cohan

Reputation: 4564

As you can see below, df.reset_index() will move the index into the dataframe as a column. If the index was just a generic numerical index, you probably don't care about it and can just discard it. Below is a simple dataframe, but I dropped the first row just to have differing values in the index.

df = pd.DataFrame([['a', 10], ['b', 20], ['c', 30], ['d', 40]], columns=['letter','number'])
df = df[df.number > 10]
print(df)
#   letter  number
# 1      b      20
# 2      c      30
# 3      d      40

Default behavior now shows a column named index which was the previous index. You can see that df['index'] matches the index from above, but the index has been renumbered starting from 0.

print(df.reset_index())
#    index letter  number
# 0      1      b      20
# 1      2      c      30
# 2      3      d      40

drop=True doesn't pretend like the index was important and just gives you a new index.

print(df.reset_index(drop=True))
#   letter  number
# 0      b      20
# 1      c      30
# 2      d      40

Regarding row order, I suspect that it would be maintained, but the order in which things are stored should not be relied on in general. If you are performing an aggregate function, you probably want to make sure you have the data ordered properly for the aggrigation.

Upvotes: 5

Related Questions