CSchulz
CSchulz

Reputation: 11020

Search and replace accross multiple lines in large files

I want to search and replace in large files (about 900 MB). I am searching in the web for hours.

In general there are two tools suitable, sed and perl.
The multiple lines syntax for sed seems to be very komplex, so I have given a try for perl.

My input data looks like the following:

K 13
svn:mergeinfo
V 498
/code/branches/TEST_ENVIRONMENT_OBC/Implementation//SpecificComponents/SUV/config/TEST:4670-4976
/code/tags/QR_20131111/Implementation/SpecificComponents/SUV/config/TEST:4669
/code/tags/QR_20131211/Implementation/SpecificComponents/SUV/config/TEST:5138
/code/trunk/Implementation/SpecificComponents/SUV/config/OBC:4669-4949
/code/trunk/Implementation/SpecificComponents/SUV/config/TEST:5137-5273
PROPS-END

I want to change the svn:mergeinfo block and replace part of the paths.
So I have written a small regex for perl.

perl -0pe 's/^svn:mergeinfo\nV (\d+)\n(?:\/code(\/(?:branches|tags|trunk)(?:.|\n)+))+\nPROPS-END/svn:mergeinfo\nV \1\n\2\nPROPS-END/m'

It works so far, but the path in the output data is only changed for the first occurrence.

K 13
svn:mergeinfo
V 498
/branches/TEST_ENVIRONMENT_OBC/Implementation//SpecificComponents/SUV/config/TEST:4670-4976
/code/tags/QR_20131111/Implementation/SpecificComponents/SUV/config/TEST:4669
/code/tags/QR_20131211/Implementation/SpecificComponents/SUV/config/TEST:5138
/code/trunk/Implementation/SpecificComponents/SUV/config/OBC:4669-4949
/code/trunk/Implementation/SpecificComponents/SUV/config/TEST:5137-5273
PROPS-END

What do I need to change to replace all occurrences of the path?

There is no requirement to use perl to solve the problem.

Upvotes: 0

Views: 170

Answers (4)

BMW
BMW

Reputation: 45223

Using awk

awk '/svn:mergeinfo/,/PROPS-END/{sub(/^\/code/,"")}1' file

Upvotes: 1

mpapec
mpapec

Reputation: 50637

perl -pe '$/ ="PROPS-END"; s!/code(?=/(?:branches|tags|trunk))!!g' file

output

K 13
svn:mergeinfo
V 498
/branches/TEST_ENVIRONMENT_OBC/Implementation//SpecificComponents/SUV/config/TEST:4670-4976
/tags/QR_20131111/Implementation/SpecificComponents/SUV/config/TEST:4669
/tags/QR_20131211/Implementation/SpecificComponents/SUV/config/TEST:5138
/trunk/Implementation/SpecificComponents/SUV/config/OBC:4669-4949
/trunk/Implementation/SpecificComponents/SUV/config/TEST:5137-5273
PROPS-END

Upvotes: 1

devnull
devnull

Reputation: 123448

You could use sed:

sed -r '/svn:mergeinfo/,/PROPS-END/{s#(/code)(/(branches|tags|trunk))(.*)#\2\4#}' inputfile

This would perform the substitution between the lines matching the patterns svn:mergeinfo and PROPS-END.

For your input, it results in:

K 13
svn:mergeinfo
V 498
/branches/TEST_ENVIRONMENT_OBC/Implementation//SpecificComponents/SUV/config/TEST:4670-4976
/tags/QR_20131111/Implementation/SpecificComponents/SUV/config/TEST:4669
/tags/QR_20131211/Implementation/SpecificComponents/SUV/config/TEST:5138
/trunk/Implementation/SpecificComponents/SUV/config/OBC:4669-4949
/trunk/Implementation/SpecificComponents/SUV/config/TEST:5137-5273
PROPS-END

Upvotes: 2

Jarmund
Jarmund

Reputation: 3205

You need to add the global flag as well. Multiline only replaces once and then finishes, despite the target being across multiple lines. The global flag makes it continue trying to match until there is no more input text.

Just add a g after the m at the end of your regex.

Upvotes: 0

Related Questions