Segmentation Fault when using perl regex on large file

Question

I am trying to find and replace pattern on a very large file (43Go) and face some problems with that. I first tried to use sed for this but it does not seems to be optimized for large file, even smaller than 43 Go so i switched to perl.

I have this command : perl -0777 -i -pe 's/(public\..)*_seq/\1_id_seq/mg' dump.sql

But it generates a segmentation fault before exiting and turns my dump of 43 Go into a 0 octet file. The file i am trying to parse is a simple postgresql database dump.

Just as an information :

# perl --version

This is perl 5, version 26, subversion 1 (v5.26.1) built for x86_64-linux-gnu-thread-multi
(with 67 registered patches, see perl -V for more detail)

Did someone already faced this problem or have any idea about how to solve this ? I would prefer prefer to keep this one line python command but if you have solutions with any other program i will take it too

choroba · Accepted Answer

-0777 tells perl to load the whole file into memory (see perlrun). If your memory is less than 43 Go (whatever it is), you'll have to find a way to process it in smaller chunks. For example, try dumping the option, or use -00 for the "paragraph mode".

Also note that, unlike in sed, you need to use $1 instead of \1 in the replacement part of a substitution in Perl.

Segmentation Fault when using perl regex on large file

Answers (1)

Related Questions