Pandas dataframe read large number as string

Question

I am creating a dataframe from a csv like this;

topcells=pd.DataFrame.from_csv("url/output_topcell.txt", header=0, sep=', ', parse_dates=True, encoding=None, tupleize_cols=False)

The column I am interested (cell) in contains long numbers (e.g. 6468716846847) which I need to be cast as strings.

After creating the dataframe the datatype seems to be numpy.float64 by default (including some nan values)

When I use:

topcells.cell=topcells.cell.astype(str)

or:

topcells['cell']=topcells['cell'].apply(lambda x: str(x))

The string I get is not actually "6468716846847" but something like "6.468716846847e+12"

How can I avoid this scientific notation and get the full number as a string?

TomAugspurger · Accepted Answer

You should use the read_csvfunction from the top-level namespace, it has more options for reading, including a dtype parameter.

for example, with tst.csv:

c1,c2,c3,c4,c5
a,b,6468716846847,12,13
d,e,6468716846848,13,14

you get:

In [11]: pd.read_csv('tst.csv', dtype={'c3': 'str'})
Out[11]: 
  c1 c2             c3  c4  c5
0  a  b  6468716846847  12  13
1  d  e  6468716846848  13  14

[2 rows x 5 columns]

Pandas dataframe read large number as string

Answers (1)

Related Questions