Reputation: 8145
I am creating a dataframe from a csv like this;
topcells=pd.DataFrame.from_csv("url/output_topcell.txt", header=0, sep=', ', parse_dates=True, encoding=None, tupleize_cols=False)
The column I am interested (cell) in contains long numbers (e.g. 6468716846847) which I need to be cast as strings.
After creating the dataframe the datatype seems to be numpy.float64 by default (including some nan values)
When I use:
topcells.cell=topcells.cell.astype(str)
or:
topcells['cell']=topcells['cell'].apply(lambda x: str(x))
The string I get is not actually "6468716846847" but something like "6.468716846847e+12"
How can I avoid this scientific notation and get the full number as a string?
Upvotes: 1
Views: 5507
Reputation: 28936
You should use the read_csv
function from the top-level namespace, it has more options for reading, including a dtype
parameter.
for example, with tst.csv
:
c1,c2,c3,c4,c5
a,b,6468716846847,12,13
d,e,6468716846848,13,14
you get:
In [11]: pd.read_csv('tst.csv', dtype={'c3': 'str'})
Out[11]:
c1 c2 c3 c4 c5
0 a b 6468716846847 12 13
1 d e 6468716846848 13 14
[2 rows x 5 columns]
Upvotes: 5