Reputation: 24506
How can i open a file that is 800 petabytes?
It's a file for some data science competition- 807167556410028 kb = 800000,556410028 TB = ~800PB
It's archived into 600 mb but i can't unzip it due to big size. Is it possible to read the first 1000 rows from the zipped archive with pandas?
Upvotes: 1
Views: 97
Reputation: 24506
import zipfile
archive = zipfile.ZipFile('bigfile.zip')
file = archive.open('big.csv')
textfilereader = pd.read_csv(file, chunksize=1000000)
df = textfilereader.get_chunk()
#df now is the dataframe.
This is somewhat partial answer as it just reads chunksize number of rows.
p.s. i tested it with 3mln rows, it fails with memory error.
p.p.s. Its the bug of my winrar archive program! I installed 7zip and it shows it's only 5GB! Lol, good lesson to learn, sometime it's the program, not the dataset!
Upvotes: 1