Reputation: 4797
I'm trying to run the following piece of code:
start_time = time.time()
csvWriter = ModalitySessions.pivot(index='session_id', columns='context_eid', values='name')
print("--- %s seconds ---" % (time.time() - start_time))
which gives me the following error:
ValueError: negative dimensions are not allowed
I found a similar problem online and it seemed like it may be an underlying memory issue. So I tried running the same code on a subset of the data, and it is indeed, a memory issue. Here's the updated code:
start_time = time.time()
csvWriter = ModalitySessions.iloc[:2000000].pivot(index='session_id', columns='context_eid', values='name')
print("--- %s seconds ---" % (time.time() - start_time))
This gives me a MemoryError
.
Would anyone have any idea on how to fix this? I'm dealing with ~3.5 Million sessions and the pivot should return about 900 columns.
Upvotes: 0
Views: 152
Reputation: 8143
You could construct a Python generator to return chunks of the CSV data at a time. In fact, this is why such a tool exists in Python. The generator could then be used to restrict the nubmer of rows loaded in to memory at any time.
Either that or, as I mentioned in my comment, look into a high memory VPS.
Upvotes: 1
Reputation: 51
You can just break up the original data into smaller chunks and append the result of their respective pivoting to another container that you initialize as empty. You would of course need a function that deals with this process of appending the fragments. As in, something that iterates over the elements of the container table that you are constructing, and compares with the elements of the newly pivoted table made up of the chunk that has just been processed, and adds the values if the column fields match. While terribly inefficient (in terms of computation time) (depending on how many chunks you divide your initial table in), I think it would word around your problem, since you would be dealing with smaller chunks of data at a time, since that appears to be some kind of wraparound error.
Upvotes: 1