Giovanni
Giovanni

Reputation: 3

DataFrame.info() differs from DataFrame.Series.describe()

I have a problem using Pandas.

When I execute autos.info() it returns:

RangeIndex: 371528 entries, 0 to 371527
Data columns (total 20 columns):
 #   Column               Non-Null Count   Dtype 
---  ------               --------------   ----- 
 0   dateCrawled          371528 non-null  object
 1   name                 371528 non-null  object
 2   seller               371528 non-null  object
 3   offerType            371528 non-null  object
 4   price                371528 non-null  int64 
 5   abtest               371528 non-null  object
 6   vehicleType          333659 non-null  object
 7   yearOfRegistration   371528 non-null  int64 
 8   gearbox              351319 non-null  object
 9   powerPS              371528 non-null  int64 
 10  model                351044 non-null  object
 11  kilometer            371528 non-null  int64 
 12  monthOfRegistration  371528 non-null  int64 
 13  fuelType             338142 non-null  object
 14  brand                371528 non-null  object
 15  notRepairedDamage    299468 non-null  object
 16  dateCreated          371528 non-null  object
 17  nrOfPictures         371528 non-null  int64 
 18  postalCode           371528 non-null  int64 
 19  lastSeen             371528 non-null  object
dtypes: int64(7), object(13)
memory usage: 56.7+ MB

But when I execute autos["price"].describe() it returns:

count    3.715280e+05
mean     1.729514e+04
std      3.587954e+06
min      0.000000e+00
25%      1.150000e+03
50%      2.950000e+03
75%      7.200000e+03
max      2.147484e+09
Name: price, dtype: float64

I don't understand why there is this type incongruence between the type of the column price.

Any suggestions?

Upvotes: 0

Views: 298

Answers (2)

maow
maow

Reputation: 2887

The return value of Series.describe() is a Series with the descriptive statistics. The dtype you see in the Series is not the dtype of the original column but the dtype of the statistics - which is float. The name of the result is price because that is set as the name of the Series autos["price"].

Upvotes: 1

r-beginners
r-beginners

Reputation: 35115

If I control the number of display digits, will I get the data I want?

pd.set_option('display.float_format', lambda x: '%.5f' % x)

df['X'].describe().apply("{0:.5f}".format)

Upvotes: 0

Related Questions