Reputation: 1311
I have a gz file of 500MB and I have split it as follows
split -b 100m "file.gz" "file1.gz.part-"
after splitting the following files are obtained
file1.gz.part-aa
file1.gz.part-ab
file1.gz.part-ac
file1.gz.part-ad
file1.gz.part-ae
I am trying to iterate over objects in gzip file using gzip as follows
with gzip.open(filename) as f:
for line in f:
This is working for file1.gz.part-aa
but for the other 4 parts I am getting
Not a gzipped file error
Upvotes: 1
Views: 295
Reputation: 11607
You can split
before you gzip
:
split -l 300000 "file.txt" "tweets1.part-"
^ every 300000 lines
Notice that the input of split
is NOT a *.gz
file but the original line-oriented file.
Then gzip
every part separately:
gzip tweets1.part-*
This will also remove the parts (there's a gzip
option to keep them).
In python, you can now consume each part separately.
Upvotes: 1
Reputation: 798696
A gzip file has a header that identifies it as a gzip file. After splitting, only the first file will have this header. Rejoin the files before processing.
Upvotes: 1