Trel
Trel

Reputation: 353

Regex in perl, match newline AND first word of next line

I have a file that looks like

title="title1"  
artist="artist1"  
title="title2"  
artist="artis2"  
title="title3"  
artist="artist3"

And so on

this command
perl -pe 's/title="(.*?)"\n//ig' list.txt

Is not working as I'd hope. If I do that alone, I get just the artist lines, BUT if I do this

perl -pe 's/title="(.*?)"\nartist//ig' list.txt

It doesn't match at all.
I've tried with and without the /g and tried with the addition of a /m I've look at the file in nano, and I don't see any additional characters between the final " in each line and the "artist" in the next.

Anyone know what I'm doing wrong? (I'm using perl rather than sed, because the regex that generates this list uses a negative lookahead).

My goal is to be able to use a line like below
perl -pe 's/title="(.*?)"\nartist="(.*?)"(?:\n|$)/\2 - \1/ig' list.txt

That would output something like

artist1 - title1  
artist2 - title2  
artist3 - title3

Upvotes: 3

Views: 3437

Answers (4)

Borodin
Borodin

Reputation: 126722

Your substitution

s/title="(.*?)"\n//ig

is replacing any line that looks like title="xxx" with nothing. It is deleting those lines.

It's unclear what you want, but if your requirement is to remove the title= and the quotes then you should use

perl -pe 's/title="(.*?)"/$1/i' myfile

The /g modifier is superfluous unless you expect many titles in a one line from the file



Update

If you want to pair titles with artists then you really need a script file. This should do what you need. The data is taken directly from your question

use strict;
use warnings 'all';
use feature 'say';

my $title;

while ( <DATA> ) {

    if ( /title="([^"]*)"/ ) {
        $title = $1;
    }
    elsif ( /artist="([^"]*)"/ ) {
        say "$1 - $title";
    }
}


__DATA__
title="title1"
artist="artist1"
title="title2"
artist="artis2"
title="title3"
artist="artist3"

output

artist1 - title1
artis2 - title2
artist3 - title3

Upvotes: 3

Casimir et Hippolyte
Casimir et Hippolyte

Reputation: 89557

IF your file is exactly as you describe it, you can use this command that reads two lines at once. In this way you avoid the slurp mode:

perl -pe '$_.=<>;s/.*?"(.*?)".*?"(.*?)"/$2 - $1/s' file

if you need something more explicit, you can use:

perl -pe 'if (/^title="/){$_.=<>;s/^.*?"(.*?)"\h*\Rartist="(.*?)"\h*/$2 - $1/}' file

Upvotes: 1

dawg
dawg

Reputation: 103774

For a "slurp" approach, you can use this regex:

(^title="([^"]+)")\s*\R(^artist="([^"]+)")\s*(?:\R|\z)

Demo

Then given your example:

$ echo "$art" 
title="title1"  
artist="artist1"  
title="title2"  
artist="artis2"  
title="title3"  
artist="artist3"

Just "slurp" the file with -0777 and print $2 and $4:

$ echo "$art" | perl -0777 -lne 'while (/(^title="([^"]+)")\s*\R(^artist="([^"]+)")\s*(?:\R|\z)/gm) { print "$4 - $2\n"}'
artist1 - title1
artis2 - title2
artist3 - title3

Upvotes: 2

Gene
Gene

Reputation: 46960

You never mentioned what you're trying to do. If you want to extract the titles and artists, you 'll want something like this:

our $s = q|
title="title1"
artist="artist1"
title="title2"
artist="artis2"
title="title3"
artist="artist3"
|;

my @matches = $s =~ /^title="(.*?)".*?^artist="(.*?)"/smg;

print join(';', @matches);

This prints

title1;artist1;title2;artis2;title3;artist3

Upvotes: 1

Related Questions