Will
Will

Reputation: 49

Substitute any other character except for a specific pattern in Perl

I have text files with lines like this:

U_town/u_LN0_pk_LN3_bnb_LN155/DD0 U_DESIGN/u_LNxx_pk_LN99_bnb_LN151_LN11_/DD5
U_master/u_LN999_pk_LN767888_bnb_LN9772/Dnn111 u_LN999_pk_LN767888_bnb_LN9772_LN9999_LN11/DD
...

I am trying to substitute any other character except for / to nothing and keep a word with pattern _LN\d+_ with Perl one-liner.

So the edited version would look like:

/_LN0__LN3__LN155/ /_LN99__LN151_LN11_/
/_LN999__LN767888_/ _LN999__LN767888__LN9772_LN9999_/

I tried below which returned empty lines

perl -pe 's/(?! _LN\d+_)[^\/].+//g' file

Below returned only '/'.

perl -pe 's/(?! _LN\d+_)\w+//g' file

Is it perhaps not possible with a one-liner and I should consider writing a code to parse character by character and see if a matching word _LN\d+_ or a character / is there?

Upvotes: 1

Views: 128

Answers (1)

zdim
zdim

Reputation: 66964

To merely remove everything other than these patterns can simply match the patterns and join the matches back

perl -wnE'say join "", m{/ | _LN[0-9]+_ }gx' file

or perhaps, depending on details of the requirements

perl -wnE'say join "", m{/ | _LN[0-9]+(?=_) }gx' file

(See explanation in the last bullet below.)

Prints, for the first line (of the two) of the shown sample input

/_LN0__LN3_//_LN99__LN151_
...

or, in the second version

/_LN0_LN3//_LN99_LN151_LN11/
...

The _LN155 is not there because it is not followed by _. See below.

Questions:

  • Why are there spaces after some / in the "edited version" shown in the question?

  • The pattern to keep is shown as _LN\d+_ but _LN155 is shown to be kept even though it is not followed by a _ in the input (but by a /) ...?

    Are underscores optional by any chance? If so, append ? to them in the pattern

    perl -wnE'say join "", m{/ | _?LN[0-9]+_? }gx' file
    

    with output

    /_LN0__LN3__LN155//_LN99__LN151_LN11_/
    

    (It's been clarified that the extra space in the shown desired output is a mistake.)

  • If the underscores "overlap," like in _LN155_LN11_, in the regex they won't be both matched by the _LN\d+_ pattern, since the first one "takes" the underscore.

    But if such overlapping instances nned be kept then replace the trailing _ with a lookahead for it, which doesn't consume it so it's there for the leading _ on the next pattern

    perl -wnE'say join "", m{/ | _LN[0-9]+(?=_) }gx' file
    

    (if the underscores are optional and you use _?LN\d+_? pattern then this isn't needed)

Upvotes: 1

Related Questions