Fuhrmanator
Fuhrmanator

Reputation: 12882

Multi-line regex should match multiple times in a file (one-line command if possible)

I'm trying to convert some (multi-line) git history info (extract file name changes) into a CSV file. Here's my regex and sample file. It's working perfectly on that site.

Regex:

commit (.+)\n(?:.*\n)+?similarity index (\d+)+%\n(rename|copy) from (.+)\n\3 to (.+)\n

Sample input:

commit 2701af4b3b66340644b01835a03bcc760e1606f8
Author: ostrovsky.alex <ostrovsky.alex@a51b5712-02d0-11de-9992-cbdf800730d7>
Date:   Sat Oct 16 20:44:32 2010 +0000

    * Moved old sources to Maven src/main/java

diff --git a/alexo-chess/src/ao/chess/v2/move/Pawns.java b/alexo-chess/src/main/java/ao/chess/v2/move/Pawns.java
similarity index 100%
rename from alexo-chess/src/ao/chess/v2/move/Pawns.java
rename to alexo-chess/src/main/java/ao/chess/v2/move/Pawns.java

commit ea53898dcc969286078700f42ca5be36789e7ea7
Author: ostrovsky.alex <ostrovsky.alex@a51b5712-02d0-11de-9992-cbdf800730d7>
Date:   Sat Oct 17 03:30:43 2009 +0000

    synch

diff --git a/src/chess/v2/move/Pawns.java b/alexo-chess/src/ao/chess/v2/move/Pawns.java
similarity index 100%
copy from src/chess/v2/move/Pawns.java
copy to alexo-chess/src/ao/chess/v2/move/Pawns.java

commit b869f395429a2c1345ce100953bfc6038d9835f5
Author: ostrovsky.alex <ostrovsky.alex@a51b5712-02d0-11de-9992-cbdf800730d7>
Date:   Wed Oct 7 22:43:06 2009 +0000

    MctsPlayer works

diff --git a/ao/chess/v2/move/Pawns.java b/src/chess/v2/move/Pawns.java
similarity index 100%
copy from ao/chess/v2/move/Pawns.java
copy to src/chess/v2/move/Pawns.java

commit 4c697c510f5154d20be7500be1cbdecbaf99495c
Author: ostrovsky.alex <ostrovsky.alex@a51b5712-02d0-11de-9992-cbdf800730d7>
Date:   Wed Sep 23 15:06:17 2009 +0000

    * synch

diff --git a/v2/move/Pawns.java b/ao/chess/v2/move/Pawns.java
similarity index 95%
rename from v2/move/Pawns.java
rename to ao/chess/v2/move/Pawns.java
index e0172a3..e3659c5 100644
--- a/v2/move/Pawns.java
+++ b/ao/chess/v2/move/Pawns.java

However, when I try to run the following perl command (in git bash on Windows 10), I only get a single matching line (as opposed to the 4 lines in the sample you can see on the site I linked to above).

I know it's probably something stupid, like it needs to be in a loop. But I'm confused about slurping -0777 and applying a pattern multiple times. I tried the -p option but it prints out the entire input, and I only want to see output from the print (i.e., the CSV lines). I also thought /g would make the pattern be applied multiple times to the input file, but since -0777 makes it all one line, I'm not sure anymore.

<Pawns.java.history.txt perl -0777 -ne 'if (/commit (.+)\n(?:.*\n)+?similarity index (\d+)+%\n(rename|copy) from (.+)\n\3 to (.+)\n/g) { print $1.",".$2.",".$3.",".$4.",".$5."\n" }'

The output is only one line, whereas it should be 4 lines in total with the sample file:

2701af4b3b66340644b01835a03bcc760e1606f8,100,rename,alexo-chess/src/ao/chess/v2/move/Pawns.java,alexo-chess/src/main/java/ao/chess/v2/move/Pawns.java

Expected output:

2701af4b3b66340644b01835a03bcc760e1606f8,100,rename,alexo-chess/src/ao/chess/v2/move/Pawns.java,alexo-chess/src/main/java/ao/chess/v2/move/Pawns.java
ea53898dcc969286078700f42ca5be36789e7ea7,100,copy,src/chess/v2/move/Pawns.java,alexo-chess/src/ao/chess/v2/move/Pawns.java
b869f395429a2c1345ce100953bfc6038d9835f5,100,copy,ao/chess/v2/move/Pawns.java,src/chess/v2/move/Pawns.java
4c697c510f5154d20be7500be1cbdecbaf99495c,95,rename,v2/move/Pawns.java,ao/chess/v2/move/Pawns.java

Upvotes: 3

Views: 123

Answers (2)

anubhava
anubhava

Reputation: 785276

You just need to convert your if with while:

perl -0777 -ne 'while (/commit (.+)\n(?:.*\n)+?similarity index (\d+)+%\n(rename|copy) from (.+)\n\3 to (.+)\n/g) { print $1.",".$2.",".$3.",".$4.",".$5."\n" }' file

2701af4b3b66340644b01835a03bcc760e1606f8,100,rename,alexo-chess/src/ao/chess/v2/move/Pawns.java,alexo-chess/src/main/java/ao/chess/v2/move/Pawns.java
ea53898dcc969286078700f42ca5be36789e7ea7,100,copy,src/chess/v2/move/Pawns.java,alexo-chess/src/ao/chess/v2/move/Pawns.java
b869f395429a2c1345ce100953bfc6038d9835f5,100,copy,ao/chess/v2/move/Pawns.java,src/chess/v2/move/Pawns.java
4c697c510f5154d20be7500be1cbdecbaf99495c,95,rename,v2/move/Pawns.java,ao/chess/v2/move/Pawns.java

Upvotes: 2

glenn jackman
glenn jackman

Reputation: 246877

The //g operator returns the captured results in list context. Since there are 5 sets of capturing parentheses and 4 matches, the returned list has 20 elements. You need to iterate over that list. Your code only looks at the first match. Here's one technique:

perl -0777 -nE '
    @matches = /commit (.+)\n(?:.*\n)+?similarity index (\d+)+%\n(rename|copy) from (.+)\n\3 to (.+)\n/g;
    $" = ",";
    while (@matches) {
        @thismatch = splice @matches, 0, 5;
        say "@thismatch";
    }
' Pawns.java.history.txt 

Upvotes: 2

Related Questions