Reputation: 85
I'm combining ~200 csv files (most 10 to 200 MB) into a single file on a flash drive using chunking (with Python 3.7.6 in iPython and Mac OS).
It's gotten up to a combined file of size 4.29 GB. When I try to write another file (size ~150 MB) onto it, I get OSError: [Errno 27] File too large
. Here's the code:
import pandas as pd
import os
paths_to_combine = ['file1.csv', ..., 'file200.csv'] # contains 200 files
output_path = 'all.csv'
for file in all_result_filenames:
chunk_container = pd.read_csv(file, chunksize=50000)
for chunk in chunk_container:
chunk.to_csv(output_path, mode="a", header=False)
From reading similar questions (here and here), it seems there may be an addressing issue when writing to files above 4 GB. Since the problem is with OS, I'm stuck on what to try next. Thank you for your help!
Upvotes: 0
Views: 1401
Reputation: 5331
Your problem doesn't have to do with macOS. It has to do with the formatting of the drive to which you want to write. macOS doesn't have limitations on file size, but it does have two default file systems: HFS+ and APFS. Both of them support file sizes into the exabyte level. We will not need to worry about file size for some time.
Your question implies that you want to write to a flash drive. (Edit. Per clarifications, you are writing to a flash drive.) The problem almost certainly is that the flash drive is formatted as FAT32, which has a maximum file size around 4 GB due to the 32-bit nature of the format.
If you are moving your data for copying to a separate computer that is itself not FAT32 (eg to Windows with NTFS or another Mac with HFS+ or APFS) then you should just reformat the drive. If the other computer is FAT32, you're out of luck (and by now you should get an upgrade).
To reformat your flash drive:
Copy anything you don't want to lose to another drive (or your local computer). Reformats will erase the drive: if you do not want to lose something, do not keep it on the drive you are about to erase.
On your Mac, open Disk Utility. In the left hand size, under the "External" tab, select your flash drive after it is plugged in, then on the top right, click Erase.
Put in a new name for your drive. Under "Format", I would recommend selecting "ExFAT". If you select "APFS" or "Mac OS Extended", your drive will not be readable by a Windows computer (without workarounds). Do not select "Windows NT Filesystem", you will not be able to write to it on a Mac. I also would keep the default GUID Partition Map scheme.
After it is finished reformatting, update your df.to_csv(path)
path and it should write just fine without the OSError
. Of course, writing may take some time. It may be strategic to pass a path
with .gz
or .xz
at the end to use compression and save IO time (at the expense of CPU time).
Note that if you want to read exFAT on an older (software-side at least) Linux machine, you'll get an error and you will need to install the exFAT utilities. But newer Linux kernels should have exFAT support embedded.
Upvotes: 3