Pandas series not showing small numbers in scientific notation depending on first entry

Question

UPDATE:

I have managed to find the error source: In the current version of pandas, dataframes with the column 'object' dtype no longer use the scientific notation. For big values the cells display the right significant figures, but for small numbers the displayed value is 0.0.

If you access the cell from the running script you still get the correct value. The issue is that if you store the dataframe, as a text file for example, you save the incorrect value.

This is a code example with the correct (for me) behaviour in previous versions:

import pandas as pd
print(f'pandas version {pd.__version__}')

idx = 'H1_6563A'
data = {'ion': 'H1',
        'wavelength': 6563.0,
        'latex_label': '$6563\AA\,HI$',
        'intgr_flux': 3.128572e-14,
        'dist': 2.8e20,
        'eqw': 1464.05371}

mySeries = pd.Series(index=data.keys(), dtype='object')
for param, value in data.items():
    mySeries[param] = value
print(f'
Series: 
 {mySeries}')

myDF = pd.DataFrame(columns=data.keys())
myDF.loc[idx] = mySeries
print(f'
DataFrame:
 {myDF}')

Where the dataframe shows a combination of scientific and non-scientific floats:

pandas version 1.2.3

Series: 
 ion                                 H1
wavelength                      6563.0
latex_label              $6563\AA\,HI$
intgr_flux                         0.0
dist           280000000000000000000.0
eqw                         1464.05371
dtype: object

DataFrame:
          ion  wavelength    latex_label    intgr_flux          dist         eqw
H1_6563A  H1      6563.0  $6563\AA\,HI$  3.128572e-14  2.800000e+20  1464.05371

The same script in pandas 1.4.1 returns:

pandas version 1.4.1

Series: 
 ion                                 H1
wavelength                      6563.0
latex_label              $6563\AA\,HI$
intgr_flux                         0.0
dist           280000000000000000000.0
eqw                         1464.05371
dtype: object

DataFrame:
          ion wavelength    latex_label intgr_flux                     dist         eqw
H1_6563A  H1     6563.0  $6563\AA\,HI$        0.0  280000000000000000000.0   1464.05371

I wonder if anyone would please share their approaches to replicate the original behaviour so I can have a dataframe with mixed variables (strings, ints, floats, None, scientific, non-scientific) and show the correct significant figures.

Thank you very much.

ORIGINAL QUESTION

I am using a pandas.Series as a container for entries of different types. I have noticed the following issue while declaring small floats in scientific notation:

import numpy as np
import pandas as pd

print(f'Pandas {pd.__version__}')

columns = ['c0', 'c1', 'c2', 'c3']
mySeries = pd.Series(index=columns)

mySeries['c0'] = 'None'
mySeries['c1'] = np.nan
mySeries['c2'] = 1234.0
mySeries['c3'] = 1.234e-18

print(mySeries)

which returns:

c0      None
c1       NaN
c2    1234.0
c3       0.0
dtype: object

Calling the 'c3' entry the returns the complete float, however, if you convert this series to a pandas.DataFrame and you save it to a text file (using the .to_string() attribute) it will be stored as 0.0.

If your first entry is a float this does not happen:

columns = ['c0', 'c1', 'c2', 'c3']
mySeries = pd.Series(index=columns)

mySeries['c0'] = 123
mySeries['c1'] = np.nan
mySeries['c2'] = 1234.0
mySeries['c3'] = 1.234e-18

print(mySeries)

c0    1.230000e+02
c1             NaN
c2    1.234000e+03
c3    1.234000e-18
dtype: float64

So my question is: Which is the right way to declare the input variable dtype so the entry order does not affect the display. Moreover, I wonder if anyone knows which is the parameter which decides when a cell uses the scientific notation or not.

Thanks a lot.

Jblasco · Accepted Answer

I would shape my df first, with proper dtypes, then add the data:

import pandas as pd

df = pd.DataFrame(
    {'ion': pd.Series(dtype='str'), 
     'wavelength': pd.Series(dtype='float'), 
     'intgr_flux': pd.Series(dtype='float')})

idx = 'H1_6563A'
data = {
    'ion': 'H1',
    'wavelength': 6563.0,
    'intgr_flux': 3.128572e-14}

df.loc[idx] = data
print(df)

# Outputs:
#         ion  wavelength    intgr_flux
# H1_6563A  H1      6563.0  3.128572e-14

Pandas series not showing small numbers in scientific notation depending on first entry

Answers (2)

Related Questions