Reputation: 1345
In a Pandas dataframe, it's possible to reset the index using the reset_index()
method. One optional argument is drop=True
which according to the documentation:
drop : bool, default False
Do not try to insert index into dataframe columns.
This resets the index to the default integer index.
My question is, what does the first sentence mean? Will it try to convert an integer index to a new column in my df if I leave if False?
Also, will my row order be preserved or should I also sort to ensure proper ordering?
Upvotes: 3
Views: 6304
Reputation: 4564
As you can see below, df.reset_index()
will move the index into the dataframe as a column. If the index was just a generic numerical index, you probably don't care about it and can just discard it. Below is a simple dataframe, but I dropped the first row just to have differing values in the index.
df = pd.DataFrame([['a', 10], ['b', 20], ['c', 30], ['d', 40]], columns=['letter','number'])
df = df[df.number > 10]
print(df)
# letter number
# 1 b 20
# 2 c 30
# 3 d 40
Default behavior now shows a column named index
which was the previous index. You can see that df['index']
matches the index from above, but the index has been renumbered starting from 0.
print(df.reset_index())
# index letter number
# 0 1 b 20
# 1 2 c 30
# 2 3 d 40
drop=True
doesn't pretend like the index was important and just gives you a new index.
print(df.reset_index(drop=True))
# letter number
# 0 b 20
# 1 c 30
# 2 d 40
Regarding row order, I suspect that it would be maintained, but the order in which things are stored should not be relied on in general. If you are performing an aggregate function, you probably want to make sure you have the data ordered properly for the aggrigation.
Upvotes: 5