Reza energy
Reza energy

Reputation: 135

printing info() at Pandas at the report the entries and index number are not the same

at Jupyter notebook I Printed df.info() the result is

print(df.info())   

<class 'pandas.core.frame.DataFrame'>
Int64Index: 20620 entries, 0 to 24867
Data columns (total 3 columns):
neighborhood    20620 non-null object
bedrooms        20620 non-null float64
price           20620 non-null float64
dtypes: float64(2), object(1)
memory usage: 644.4+ KB

why it shows 20620 entries form 0 to 24867? The last number (24867) should be 20620 or 20619

Upvotes: 1

Views: 1063

Answers (1)

unutbu
unutbu

Reputation: 879083

It means that not every possible index value has been used. For example,

In [13]: df = pd.DataFrame([10,20], index=[0,100])

In [14]: df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 2 entries, 0 to 100
Data columns (total 1 columns):
0    2 non-null int64
dtypes: int64(1)
memory usage: 32.0 bytes

df has 2 entries, but the Int64Index "ranges" from 0 to 100.

DataFrames can easily end up like this if rows have been deleted, or if df is a sub-DataFrame of another DataFrame.

If you reset the index, the index labels will be renumbered in order, starting from 0:

In [17]: df.reset_index(drop=True)
Out[17]: 
    0
0  10
1  20

In [18]: df.reset_index(drop=True).info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2 entries, 0 to 1
Data columns (total 1 columns):
0    2 non-null int64
dtypes: int64(1)
memory usage: 96.0 bytes

To be more precise, as Chris points out, the line

Int64Index: 2 entries, 0 to 100

is merely reporting the first and last value in the Int64Index. It's not reporting min or max values. There can be higher or lower integers in the index:

In [32]: pd.DataFrame([10,20,30], index=[50,0,50]).info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 3 entries, 50 to 50  # notice index value 0 is not mentioned
Data columns (total 1 columns):
0    3 non-null int64
dtypes: int64(1)
memory usage: 48.0 bytes

Upvotes: 2

Related Questions