Navy Seal
Navy Seal

Reputation: 125

Large File crashing on Jupyter Notebook

I have a very simple task: I need to take a sum of 1 column in a file that has many columns and thousand of rows. However, every time I open the file on jupyter, it crashes since I cannot go over 100 MB per file.

Is there any work around for such a task? I feel I shouldnt have to open the entire file since I need just 1 column.

Thanks!

Upvotes: 1

Views: 11208

Answers (3)

loving_guy
loving_guy

Reputation: 383

You should slice through rows and put it in different other data frames and then works on respective data frames. Hanging issues are because of RAM insufficiency in your system.

Use new_dataframe = dataframe.iloc[: , :]- or new_dataframe = dataframe.loc[: , :]-methods for slicing in pandas.

Rows slicing before colon and column slicing after colon.

Upvotes: 1

Yasin Yousif
Yasin Yousif

Reputation: 967

You have to open the file even if you want just one row, .. opening it load it into some other memory and here is your problem .

You can either open the file outside Ipython and split it to smaller size OR

Use a library like pandas and read it in chunks , as in the answer

Upvotes: 2

Matt Elgazar
Matt Elgazar

Reputation: 725

I'm not sure if this will work since the information you have provided is somewhat limited, but if you're using python 3 I had a similar issue. Try typing this at the top and see if this helps. It might fix your issue.

import os
os.environ['KMP_DUPLICATE_LIB_OK'] = 'True'

The above solution is sort of a band-aid and isn't supported and may cause undefined behavior. If your data is too big for your memory try reading in the data with dask.

import dask.dataframe as dd
dd.read_csv(path, params)

Upvotes: 2

Related Questions