Reputation: 31593
Why is the size of files capped at 4 GB when outputting to a file using print? I would expect that with streaming output it should be possible to generate files of arbitrary size.
Update: ijw and Chas. Owens were correct. I thought the F: drive was NTFS formatted, but in fact it used the FAT32 filesystem. I tried it on another drive and I could generate a 20 GB text file. There are no limits in this case. Apologies to all.
Details: while researching for answering a question here on Stack Overflow I needed to measure the performance of reading a very large text file using Perl. In order to test the reading I needed a large text file and I wrote a small Perl script to generate the text file and ran into an unexpected problem. The output file grows until it reach 4 GB. According to Windows Explorer the size in one run of the script was 4294967269 bytes (and 4294967296 bytes on disk). The script continues, but the file no longer grows.
Essential it is just a number of:
print NUMBERS_OUTFILE $line;
where $line is a long string with a "\n" at the end. The length of the line can be configured and is not critical for this problem; e.g. 250 characters or 34000 characters. NUMBERS_OUTFILE is a file handle created with:
open ( NUMBERS_OUTFILE,">F:\temp2\out1.txt")
Drive F: is NTFS formatted and is on a separate physical hard disk from the disk with the operating system.
What is the reason and is there a work-around?
Full Perl script and BAT driver script (HTML formatted with the pre tag). If the two environment variables MBSIZE and OUTFILE are setup then the Perl script should be able to run unchanged on other platforms than Windows.
Platform: Perl 5.10.0 from ActiveState; 32 bit; build 1004. Windows XP x64 SP2, 8 GB RAM, 500 GB Green Caviar hard disks.
perl -V
says:
Summary of my perl5 (revision 5 version 10 subversion 0) configuration:
Platform:
osname=MSWin32, osvers=5.00, archname=MSWin32-x86-multi-thread
uname=''
config_args='undef'
hint=recommended, useposix=true, d_sigaction=undef
useithreads=define, usemultiplicity=define
useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef
use64bitint=undef, use64bitall=undef, uselongdouble=undef
usemymalloc=n, bincompat5005=undef
Compiler:
cc='cl', ccflags ='-nologo -GF -W3 -MD -Zi -DNDEBUG -O1 -DWIN32 -D_CONSOLE -DNO_ST
RICT -DHAVE_DES_FCRYPT -DUSE_SITECUSTOMIZE -DPRIVLIB_LAST_IN_INC -DPERL_IMPLICIT_CONTE
XT -DPERL_IMPLICIT_SYS -DUSE_PERLIO -DPERL_MSVCRT_READFIX',
optimize='-MD -Zi -DNDEBUG -O1',
cppflags='-DWIN32'
ccversion='12.00.8804', gccversion='', gccosandvers=''
intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
d_longlong=undef, longlongsize=8, d_longdbl=define, longdblsize=10
ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='__int64', lseeksize=8
alignbytes=8, prototype=define
Linker and Libraries:
ld='link', ldflags ='-nologo -nodefaultlib -debug -opt:ref,icf -libpath:"D:\Perl\
lib\CORE" -machine:x86'
libpth=\lib
libs= oldnames.lib kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib a
dvapi32.lib shell32.lib ole32.lib oleaut32.lib netapi32.lib uuid.lib ws2_32.lib mpr.l
ib winmm.lib version.lib odbc32.lib odbccp32.lib msvcrt.lib
perllibs= oldnames.lib kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.l
ib advapi32.lib shell32.lib ole32.lib oleaut32.lib netapi32.lib uuid.lib ws2_32.lib m
pr.lib winmm.lib version.lib odbc32.lib odbccp32.lib msvcrt.lib
libc=msvcrt.lib, so=dll, useshrplib=true, libperl=perl510.lib
gnulibc_version=''
Dynamic Linking:
dlsrc=dl_win32.xs, dlext=dll, d_dlsymun=undef, ccdlflags=' '
cccdlflags=' ', lddlflags='-dll -nologo -nodefaultlib -debug -opt:ref,icf -libpat
h:"D:\Perl\lib\CORE" -machine:x86'
Characteristics of this binary (from libperl):
Compile-time options: MULTIPLICITY PERL_DONT_CREATE_GVSV
PERL_IMPLICIT_CONTEXT PERL_IMPLICIT_SYS
PERL_MALLOC_WRAP PL_OP_SLAB_ALLOC USE_ITHREADS
USE_LARGE_FILES USE_PERLIO USE_SITECUSTOMIZE
Locally applied patches:
ActivePerl Build 1004 [287188]
33741 avoids segfaults invoking S_raise_signal() (on Linux)
33763 Win32 process ids can have more than 16 bits
32809 Load 'loadable object' with non-default file extension
32728 64-bit fix for Time::Local
Built under MSWin32
Compiled at Sep 3 2008 13:16:37
@INC:
D:/Perl/site/lib
D:/Perl/lib
.
Upvotes: 1
Views: 1867
Reputation: 32354
I think that the problem is that you cannot write to file positions later than 4 GB due to the limit of 4 bytes for the file position pointer. This is even though you are using streaming output as Perl still has to keep track of the file position.
I would try to use Win32API::File instead - it allows seeking to positions larger than 4 GB by sending the high order 4 bytes of the file position pointer in a different field, and should work well using writeFile()
to write to the output file.
Upvotes: 5
Reputation: 11087
I guess the "32 bit" part is the problem... The largest number you can represent in a 32-bit number is around 4 GB (http://en.wikipedia.org/wiki/Integer_%28computer_science%29)
--Edit--
I wasn't actually referring to the filesystem limit, but to the Perl limit. As it's compiled on 32-bit and can only access 4 GB of raRAM. NTFS as far as I know does have a limit around 8 GB, and uses some kind of windowing method to read those files. But that's another story.
Upvotes: 2
Reputation: 129481
Here's one thing I found (link):
The INSTALL document describes several Configure-time options. Some of these will work with Cygwin, others are not yet possible. Also, some of these are experimental. You can either select an option when Configure prompts you or you can define (undefine) symbols on the command line.
...
-Duselargefiles
Although Win32 supports large files, Cygwin currently uses 32-bit integers for internal size and position calculations.
Upvotes: 5
Reputation: 64919
Hmm, that is odd. At least on OS X and Linux, the limit is imposed by the filesystem. Perhaps Activestate Perl on Win32 is not compiled with largefile support? Could you post the result of running perl -V
?
The portion of the output we care about is
Platform:
osname=MSWin32, osvers=5.00, archname=MSWin32-x86-multi-thread
uname=''
config_args='undef'
hint=recommended, useposix=true, d_sigaction=undef
useithreads=define, usemultiplicity=define
useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef
use64bitint=undef, use64bitall=undef, uselongdouble=undef
usemymalloc=n, bincompat5005=undef
Specifically, uselargefiles=define
. The fact that this feature is defined (i.e. turned on) means that Perl will use an unsigned 64 bit integer for file offsets. This, theoretically, enables files up to 16 exabytes (17,179,869,184 gigabytes); however, filesystem limits often come into play before you reach that limit.
Upvotes: 7