Reputation: 478
I've got a Pandas dataframe with 118 columns and I'd like to add a new column 'x119'. I tried using various methods which all seem to work like:
df = df.assign(x119=F))
or:
df.loc[:,'x119'] = F
The methods seem to add the column to the df
dataframe but when I use:
df.describe()
I still get 118 columns. Has anyone encountered this situation? The column seem to exist when calling df['x119']
but not shown in the description of df.describe()
.
EDIT: The values of F are categorical with numeric values of 1,2,3. The column 'x119'
did not exist in df before and when I use df2=df
and then df2.decribe()
it works fine and I can see all columns.
Upvotes: 1
Views: 1042
Reputation: 14689
df.describe()
works fine after df.assign(..)
for numeric datatypes, here's a reproducible example:
>>> df = pd.DataFrame([[1,2],[3,4]], columns=list('AB'))
>>> df
A B
0 1 2
1 3 4
>>> import numpy as np
>>> df["C"] = np.nan
>>> df
A B C
0 1 2 NaN
1 3 4 NaN
>>> df.describe()
A B C
count 2.000000 2.000000 0.0
mean 2.000000 3.000000 NaN
std 1.414214 1.414214 NaN
min 1.000000 2.000000 NaN
25% 1.500000 2.500000 NaN
50% 2.000000 3.000000 NaN
75% 2.500000 3.500000 NaN
max 3.000000 4.000000 NaN
>>> df.assign(D=5)
A B C D
0 1 2 NaN 5
1 3 4 NaN 5
>>> df.describe()
A B C
count 2.000000 2.000000 0.0
mean 2.000000 3.000000 NaN
std 1.414214 1.414214 NaN
min 1.000000 2.000000 NaN
25% 1.500000 2.500000 NaN
50% 2.000000 3.000000 NaN
75% 2.500000 3.500000 NaN
max 3.000000 4.000000 NaN
>>> df = df.assign(D=5)
>>> df.describe()
A B C D
count 2.000000 2.000000 0.0 2.0
mean 2.000000 3.000000 NaN 5.0
std 1.414214 1.414214 NaN 0.0
min 1.000000 2.000000 NaN 5.0
25% 1.500000 2.500000 NaN 5.0
50% 2.000000 3.000000 NaN 5.0
75% 2.500000 3.500000 NaN 5.0
max 3.000000 4.000000 NaN 5.0
>>>
For mixed object and numeric datatypes, you need to do df.describe(include='all')
as mentioned in the Notes section from the documentation here:
For mixed data types provided via a DataFrame, the default is to return only an analysis of numeric columns. If include='all' is provided as an option, the result will include a union of attributes of each type.
>>> df["E"] = ['1','2']
>>> df
A B C D E
0 1 2 NaN 5 1
1 3 4 NaN 5 2
>>> df.describe()
A B C D
count 2.000000 2.000000 0.0 2.0
mean 2.000000 3.000000 NaN 5.0
std 1.414214 1.414214 NaN 0.0
min 1.000000 2.000000 NaN 5.0
25% 1.500000 2.500000 NaN 5.0
50% 2.000000 3.000000 NaN 5.0
75% 2.500000 3.500000 NaN 5.0
max 3.000000 4.000000 NaN 5.0
>>> df
A B C D E
0 1 2 NaN 5 1
1 3 4 NaN 5 2
>>>
so you need to call describe as follows:
>>> df.describe(include='all')
A B C D E
count 2.000000 2.000000 0.0 2.0 2
unique NaN NaN NaN NaN 2
top NaN NaN NaN NaN 2
freq NaN NaN NaN NaN 1
mean 2.000000 3.000000 NaN 5.0 NaN
std 1.414214 1.414214 NaN 0.0 NaN
min 1.000000 2.000000 NaN 5.0 NaN
25% 1.500000 2.500000 NaN 5.0 NaN
50% 2.000000 3.000000 NaN 5.0 NaN
75% 2.500000 3.500000 NaN 5.0 NaN
max 3.000000 4.000000 NaN 5.0 NaN
>>>
Upvotes: 1
Reputation: 863166
I think problem should be x119
column was in df before, so only overwrite values.
You can check it by:
print (df['x119'])
Simpliest add new column is by:
print (len(df.columns))
df['x119'] = F
print (len(df.columns))
Upvotes: 1