Santhosh
Santhosh

Reputation: 11834

Negative lookahead with awk or sed not possible but only perl supports

I have text which spans multiple lines

    ... someabove text

  jpqpq====== mcvnmcv

    .... s;ql[[pw]]

    <<<<<< uyuuey

    ... middle text

  jhasjh  ======dsadsas

    .... grqywtrt

  klklk  <<<<<<alallal

    ... someend text

I want to remove all the text from ====== till <<<<<<

In sublime text i use

find: (?s)(======(?:(?!======).)*?<<<<<<)

replace :

and all the occurences are removed and output looks

    ... someabove text

  jpqpq     uyuuey

    ... middle text

  jhasjh  alallal

    ... someend text

Now i want to do this using command line using sed or awk or anything. Because everytime to open the file and do replace is tedious

But i searched for sed and awk, i found that they dont support non zero regex. and perl is used in these cases

Can someone confirm that sed and awk cant use such patterns like this (======(?:(?!======).)*?<<<<<<) and have to try some indirect ways.

Still i am looking for how to do this with sed and awk (even indirect) and also perl (if lookahead is allowed)

with perl also it didnt work

perl -ne 's/"(======(?:(?!======).)*?<<<<<<)"/""/g; print' file

blank output

Upvotes: 2

Views: 1641

Answers (3)

user7712945
user7712945

Reputation:

if no < character within ===== till <<<<< in data 'd' file, tried on gnu sed

sed -Ez 's/={6}[^<]*<{6}//g' d

Upvotes: 0

Ed Morton
Ed Morton

Reputation: 204015

Right you don't get looka-whatever with sed or awk but you also don't need it, it's just syntactic sugar. With GNU awk for multi-char RS:

$ awk -v RS='<<<<<<' -v ORS= 'RT{sub(/======.*/,"")} 1' file
    ... someabove text

  jpqpq uyuuey

    ... middle text

  jhasjh  alallal

    ... someend text

and with GNU sed for -z:

$ sed -z 's/@/@A/g; s/{/@B/g; s/}/@C/g; s/======/{/g; s/<<<<<</}/g;
          s/{[^{}]*}//g;
          s/}/<<<<<</g; s/======/{/g; s/@C/}/g; s/@B/{/g; s/@A/@/g
' file
    ... someabove text

  jpqpq uyuuey

    ... middle text

  jhasjh  alallal

    ... someend text

Upvotes: 0

terdon
terdon

Reputation: 3380

Yes, neither awk nor sed support lookarounds. More specifically, the regex flavors they use don't support them.

Your perl command failed because you need to tell it that this is a multiline string (the s) modifier. But that would still fail because perl reads input line by line, and would apply the replacement operator to each line. If you want it to match across the entire file, you need to slurp it with -0777. This does what you need:

$ perl -0777pe 's/======.*?<<<<<<//gs' file 
    ... someabove text

  jpqpq uyuuey

    ... middle text

  jhasjh  alallal

    ... someend text

The -0777 causes perl to slurp the entire file. The -p makes it print each line and the -e gives it what you want it to do. I also simplified your regex since there seems no reason to use such a complex approach. ======.*?<<<<<< will match ======, then the .*?<<<<<< means "as few characters as possible until the <<<<<<. Finally, the /sg at the end will activate multiline strings (s, allowing the . to match newlines) and will make the replacement operator work globally (g) so it will replace all occurrences.


In sed, if your markers were on lines by themselves, that is if you wanted to delete everything on the ====== and <<<<<< lines, you could do this:

$ sed '/======/,/<<<<<</d' file 
    ... someabove text


    ... middle text


    ... someend text

But that wont' work for you here.

Upvotes: 2

Related Questions