user3535695
user3535695

Reputation: 85

Memory pycharm python

Below code is used to read csv file and write the output to a csv file. This code worked perfectly fine. But when the csv file size (number of rows) got increased this gives an error. I tried changing the Xms to 512m, Xmx to 2024m and XX:ReservedCodeCacheSize to 480m. But still getting the memory error.

Traceback (most recent call last):
File "/root/PycharmProjects/AppAct/statfile.py", line 5, in <module>
   df = df.astype(float)
File "pandas/core/generic.py", line 5691, in astype
   **kwargs)
File "pandas/core/internals/managers.py", line 531, in astype
   return self.apply('astype', dtype=dtype, **kwargs)
File "pandas/core/internals/managers.py", line 402, in apply
   bm._consolidate_inplace()
File "pandas/core/internals/managers.py", line 929, in _consolidate_inplace
   self.blocks = tuple(_consolidate(self.blocks))
File "pandas/core/internals/managers.py", line 1899, in _consolidate
   _can_consolidate=_can_consolidate)
File "pandas/core/internals/blocks.py", line 3149, in _merge_blocks
   new_values = new_values[argsort]
MemoryError

import pandas as pd

all_df = pd.read_csv("/root/Desktop/Time-20ms/AllDataNew20ms.csv")
df = all_df.loc[:, all_df.columns != "activity"]
df = df.astype(float)
mask = (df != 0).any(axis=1)
df = df[mask]
recover_lines_of_activity_column = all_df["activity"][mask]
final_df = pd.concat([recover_lines_of_activity_column, df], axis=1)
final_df.to_csv("/root/Desktop/Dataset.csv", index=False)

Upvotes: 0

Views: 234

Answers (1)

AKX
AKX

Reputation: 169338

Changing your PyCharm memory limits (as -Xms and other JVM settings do) will have absolutely no effect on the Python interpreter that's actually running your Python code.

Plain and simple, you're running out of system memory when you're converting your entire dataframe to floats (df = df.astype(float)).

Beyond changing your code to do things more efficiently, you could add physical memory or enable swap.

(Also, surely you're using a 64-bit Python?)

One easy optimization would be to do less work copying and converting the data – pass in dtype=... directly to pd.read_csv(). See this answer, for example.

Upvotes: 0

Related Questions