Reputation: 85
Below code is used to read csv file and write the output to a csv file. This code worked perfectly fine. But when the csv file size (number of rows) got increased this gives an error. I tried changing the Xms to 512m, Xmx to 2024m and XX:ReservedCodeCacheSize to 480m. But still getting the memory error.
Traceback (most recent call last):
File "/root/PycharmProjects/AppAct/statfile.py", line 5, in <module>
df = df.astype(float)
File "pandas/core/generic.py", line 5691, in astype
**kwargs)
File "pandas/core/internals/managers.py", line 531, in astype
return self.apply('astype', dtype=dtype, **kwargs)
File "pandas/core/internals/managers.py", line 402, in apply
bm._consolidate_inplace()
File "pandas/core/internals/managers.py", line 929, in _consolidate_inplace
self.blocks = tuple(_consolidate(self.blocks))
File "pandas/core/internals/managers.py", line 1899, in _consolidate
_can_consolidate=_can_consolidate)
File "pandas/core/internals/blocks.py", line 3149, in _merge_blocks
new_values = new_values[argsort]
MemoryError
import pandas as pd
all_df = pd.read_csv("/root/Desktop/Time-20ms/AllDataNew20ms.csv")
df = all_df.loc[:, all_df.columns != "activity"]
df = df.astype(float)
mask = (df != 0).any(axis=1)
df = df[mask]
recover_lines_of_activity_column = all_df["activity"][mask]
final_df = pd.concat([recover_lines_of_activity_column, df], axis=1)
final_df.to_csv("/root/Desktop/Dataset.csv", index=False)
Upvotes: 0
Views: 234
Reputation: 169338
Changing your PyCharm memory limits (as -Xms
and other JVM settings do) will have absolutely no effect on the Python interpreter that's actually running your Python code.
Plain and simple, you're running out of system memory when you're converting your entire dataframe to floats (df = df.astype(float)
).
Beyond changing your code to do things more efficiently, you could add physical memory or enable swap.
(Also, surely you're using a 64-bit Python?)
One easy optimization would be to do less work copying and converting the data – pass in dtype=...
directly to pd.read_csv()
. See this answer, for example.
Upvotes: 0