Reputation: 135
at Jupyter notebook I Printed df.info() the result is
print(df.info())
<class 'pandas.core.frame.DataFrame'>
Int64Index: 20620 entries, 0 to 24867
Data columns (total 3 columns):
neighborhood 20620 non-null object
bedrooms 20620 non-null float64
price 20620 non-null float64
dtypes: float64(2), object(1)
memory usage: 644.4+ KB
why it shows 20620 entries form 0 to 24867? The last number (24867) should be 20620 or 20619
Upvotes: 1
Views: 1063
Reputation: 879083
It means that not every possible index value has been used. For example,
In [13]: df = pd.DataFrame([10,20], index=[0,100])
In [14]: df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 2 entries, 0 to 100
Data columns (total 1 columns):
0 2 non-null int64
dtypes: int64(1)
memory usage: 32.0 bytes
df
has 2 entries, but the Int64Index "ranges" from 0 to 100.
DataFrames can easily end up like this if rows have been deleted, or if df
is a sub-DataFrame of another DataFrame.
If you reset the index, the index labels will be renumbered in order, starting from 0:
In [17]: df.reset_index(drop=True)
Out[17]:
0
0 10
1 20
In [18]: df.reset_index(drop=True).info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2 entries, 0 to 1
Data columns (total 1 columns):
0 2 non-null int64
dtypes: int64(1)
memory usage: 96.0 bytes
To be more precise, as Chris points out, the line
Int64Index: 2 entries, 0 to 100
is merely reporting the first and last value in the Int64Index. It's not reporting min or max values. There can be higher or lower integers in the index:
In [32]: pd.DataFrame([10,20,30], index=[50,0,50]).info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 3 entries, 50 to 50 # notice index value 0 is not mentioned
Data columns (total 1 columns):
0 3 non-null int64
dtypes: int64(1)
memory usage: 48.0 bytes
Upvotes: 2