Erdem Sarp
Erdem Sarp

Reputation: 31

Remove everything except some word from every line

I am trying to find a regex solution to keep &#xA; and remove other things without breaking the line order. Only some of the lines contains this pattern one or more times. I tried (?<=&#xA;)(.+)|(.+)(?=&#xA;)|^((?!&#xA;).)*$, but it only keeps one from each row, although they contain more. For example, I have something like that:

The client requires photos of a radioactive world&#xA;Reach the target planet.
The client requires photos.&#xA;&#xA;Reach the target planet.
The client requires photos of a desert world&#xA;Reach the target planet.
The client requires photos of an airless world. Reach the target planet.
The client requires photos of a strange world&#xA;&#xA;Reach the target planet&#xA;Make a quick scan.

Expecting exactly this:

&#xA;
&#xA;&#xA;
&#xA;

&#xA;&#xA;&#xA;

I would be glad if you help.

Upvotes: 2

Views: 737

Answers (3)

The fourth bird
The fourth bird

Reputation: 163207

You can make use of SKIP FAIL to match &#xA; and then not consume the match.

Then match all characters except &, and when it does encounter &, assert that it is not directly followed by #xA;

Find what

&#xA;(*SKIP)(*FAIL)|[^&\r\n]+(?:&(?!#xA;)[^&\r\n]*)*

Replace with:

Leave empty

Explanation

  • #xA; Match literally
  • (*SKIP)(*FAIL)| Consume the characters that you want to avoid
  • [^&\r\n]+ Match 1+ times any char except & or a newline
  • (?: Non capture group
    • &(?!#xA;) Match & if not directly followed by #xA;
    • [^&\r\n]* Match 0+ times any char except & or a newline
  • )* Close the non capture group and repeat 0+ times

Regex demo

enter image description here

Upvotes: 1

You could use a capturing group.

(.*?)((?:&#xA;){0,})

Details:

  • (.*?): Group1 - matches any characters as few as possible
  • ((?:&#xA;){0,}): Group2 - matches &#xA; or not

Demo

Upvotes: 1

Ankit
Ankit

Reputation: 702

You can use the following RegEx to match everything except &#xA

[^&#xA;\n]+

Demo

Upvotes: 1

Related Questions