Reputation: 919
I have a large csv (~20 mil rows) and I'd like to convert one column from string to float. I do this way:
df['sale']=df['sale'].str.replace(",", ".").astype('float32')
and sale looks like:
86,2600
20,2800
123,5000
30,7500
8,3600
The command seems unstable, i.e sometimes gives the following memory error:
MemoryError Traceback (most recent call last) in () ----> 1 df['sale']=df['sale'].str.replace(",", ".").astype('float32');
What is exactly this error and how can I fix it? Thank you!
Upvotes: 1
Views: 616
Reputation: 394199
Rather than converting after loading which is a memory intensive operation. You can specify that the decimal separator is European style by passing the param decimal=','
to read_csv
:
pd.read_csv(FILENAME, decimal=',')
Example:
In[24]:
t="""data
86,2600
20,2800
123,5000
30,7500
8,3600"""
df = pd.read_csv(io.StringIO(t), decimal=',', sep=';')
df
Out[24]:
data
0 86.26
1 20.28
2 123.50
3 30.75
4 8.36
Note that I pass sep=';'
otherwise it will treat the above as 2 columns as the default separator is comma.
We can see that the output shows that it's decimal, and we can confirm the dtype
using .info()
:
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 1 columns):
data 5 non-null float64
dtypes: float64(1)
memory usage: 120.0 bytes
Upvotes: 2