Reputation: 51
My server is having 8GB of RAM and I am using pandas read_csv function for reading a csv file to a dataframe but it is executing as "Killed" for csv size greater than 900MB.
Please anyone help me handling this situation. I am attaching my meminfo for getting advises on how to clear memory on the server Memory info image
Upvotes: 2
Views: 3835
Reputation: 383
In my case, it was a memory-related issue. Setting the nrows parameter in pd.read_csv. It's not a solution but I was able to debbug this way.
Upvotes: 0
Reputation: 2122
pandas
can return an iterator for large files.
import pandas as pd
foo = pd.read_csv('bar.csv', iterator=True, chunksize=1000)
This will return an iterator. You can then apply operations to the data in chunks using a for loop. It therefore does not read the whole file into memory at once. The chunk size is the number of rows per chunk.
It will be something like this:
for chunk in foo:
# do something with chunk
EDIT: To the best of my knowledge you will have to apply functions like unique
in chunks as well.
import numpy as np
unique_foo = []
for i in df:
unique_foo.append(i['foo'].unique())
unique_foo = np.unique(unique_eff)
Upvotes: 4
Reputation: 2023
(You should be a little more specific about what code you're typing, and what kind of error you're receiving.)
If pandas
is not working with a file too large, you should revert to the more basic csv
package. And you can still import in a DataFrame if you feel more comfortable that way.
Something like:
with open("file.csv", 'rb') as csv_file:
reader = csv.reader(csv_file, delimiter=',')
df = pd.DataFrame(list(reader))
Upvotes: 0