Reputation: 65
I encounter a problem I never had before.
I'm just trying to save a dataframe as a csv with .tocsv but after hours it is still running..
My dataframe is all the post from stackoverflow for the last year and the tags associated. I used a neural network : SentenceBert to embedd each posts as vector. The vector size for each post is 768.
So my final dataframe looks like that :
With 1 194 445 rows.
Is it because it's too big ? If so, is there any other solutions to save this dataframe as a csv ?
Thanks !
Upvotes: 0
Views: 172
Reputation: 169124
A text CSV file with 1.2 million rows, each containing, say, 512 bytes of other data and a 768-item embedding in text format (assuming each number takes about 12 bytes to print out, delimiters included)
>>> (768*12 + 512) * 1194445
11619560960
will be about 11 gigabytes. Writing that will take a while, and reading it in will take another long while.
Use a binary format, e.g. pickles via to_pickle()
(or something more advanced if you feel like it) for data like this.
Upvotes: 1