AR_
AR_

Reputation: 478

Pandas df.describe doesn't work after adding new column

I've got a Pandas dataframe with 118 columns and I'd like to add a new column 'x119'. I tried using various methods which all seem to work like:

df = df.assign(x119=F))

or:

df.loc[:,'x119'] = F

The methods seem to add the column to the df dataframe but when I use:

df.describe()

I still get 118 columns. Has anyone encountered this situation? The column seem to exist when calling df['x119'] but not shown in the description of df.describe().

EDIT: The values of F are categorical with numeric values of 1,2,3. The column 'x119' did not exist in df before and when I use df2=df and then df2.decribe() it works fine and I can see all columns.

Upvotes: 1

Views: 1042

Answers (2)

Mohamed Ali JAMAOUI
Mohamed Ali JAMAOUI

Reputation: 14689

Case 1: all datatypes are numeric:

df.describe() works fine after df.assign(..) for numeric datatypes, here's a reproducible example:

>>> df = pd.DataFrame([[1,2],[3,4]], columns=list('AB'))
>>> df
   A  B
0  1  2
1  3  4
>>> import numpy as np 
>>> df["C"] = np.nan 
>>> df
   A  B   C
0  1  2 NaN
1  3  4 NaN
>>> df.describe()
              A         B    C
count  2.000000  2.000000  0.0
mean   2.000000  3.000000  NaN
std    1.414214  1.414214  NaN
min    1.000000  2.000000  NaN
25%    1.500000  2.500000  NaN
50%    2.000000  3.000000  NaN
75%    2.500000  3.500000  NaN
max    3.000000  4.000000  NaN
>>> df.assign(D=5)
   A  B   C  D
0  1  2 NaN  5
1  3  4 NaN  5
>>> df.describe()
              A         B    C
count  2.000000  2.000000  0.0
mean   2.000000  3.000000  NaN
std    1.414214  1.414214  NaN
min    1.000000  2.000000  NaN
25%    1.500000  2.500000  NaN
50%    2.000000  3.000000  NaN
75%    2.500000  3.500000  NaN
max    3.000000  4.000000  NaN
>>> df  = df.assign(D=5)
>>> df.describe()
              A         B    C    D
count  2.000000  2.000000  0.0  2.0
mean   2.000000  3.000000  NaN  5.0
std    1.414214  1.414214  NaN  0.0
min    1.000000  2.000000  NaN  5.0
25%    1.500000  2.500000  NaN  5.0
50%    2.000000  3.000000  NaN  5.0
75%    2.500000  3.500000  NaN  5.0
max    3.000000  4.000000  NaN  5.0
>>> 
  • Make sure you assign the result of df.assign back to df like df= df.assign(...)

Case 2: mixed numeric and object datatypes:

For mixed object and numeric datatypes, you need to do df.describe(include='all') as mentioned in the Notes section from the documentation here:

For mixed data types provided via a DataFrame, the default is to return only an analysis of numeric columns. If include='all' is provided as an option, the result will include a union of attributes of each type.

>>> df["E"] = ['1','2']
>>> df
   A  B   C  D  E
0  1  2 NaN  5  1
1  3  4 NaN  5  2
>>> df.describe()
              A         B    C    D
count  2.000000  2.000000  0.0  2.0
mean   2.000000  3.000000  NaN  5.0
std    1.414214  1.414214  NaN  0.0
min    1.000000  2.000000  NaN  5.0
25%    1.500000  2.500000  NaN  5.0
50%    2.000000  3.000000  NaN  5.0
75%    2.500000  3.500000  NaN  5.0
max    3.000000  4.000000  NaN  5.0
>>> df
   A  B   C  D  E
0  1  2 NaN  5  1
1  3  4 NaN  5  2
>>> 

so you need to call describe as follows:

>>> df.describe(include='all')
               A         B    C    D    E
count   2.000000  2.000000  0.0  2.0    2
unique       NaN       NaN  NaN  NaN    2
top          NaN       NaN  NaN  NaN    2
freq         NaN       NaN  NaN  NaN    1
mean    2.000000  3.000000  NaN  5.0  NaN
std     1.414214  1.414214  NaN  0.0  NaN
min     1.000000  2.000000  NaN  5.0  NaN
25%     1.500000  2.500000  NaN  5.0  NaN
50%     2.000000  3.000000  NaN  5.0  NaN
75%     2.500000  3.500000  NaN  5.0  NaN
max     3.000000  4.000000  NaN  5.0  NaN
>>> 

Upvotes: 1

jezrael
jezrael

Reputation: 863166

I think problem should be x119 column was in df before, so only overwrite values.

You can check it by:

print (df['x119'])

Simpliest add new column is by:

print (len(df.columns))
df['x119'] = F
print (len(df.columns))

Upvotes: 1

Related Questions