joesmity1234
joesmity1234

Reputation: 1

Notepad++ and regex (multiline)

I have been facing a challenge. I have a text file with the following pattern:

SOME RANDOM TITLE IN CAPS (nnnn)
text text text
more text 
...
SOME OTHER RANDOM TITLE IN CAPS (nnnn)

What is for sure is that what I want to extract are lines with a bracket and a date ex: (2015) ; (20008) After the (nnnn) there is no text, sometimes space and CR LF, sometimes just CR LF

I would like to delete everything else and keep just the TITLE LINE with the brackets

The time I spent I could have done it by hand (there are 100lines) but I like the challenge :)

I thought I could find the issue but I am stuck.

I have tried something along this line:

^.*\(\d\d\d\d\)(?s)(.*)(^.*\(\d\d\d\d\))

But I don't get what I want. I can't seem to stop the (?s)(.*) going all the way to the end of the text instead of stopping at the next occurrence.

Upvotes: 0

Views: 2186

Answers (3)

Julio
Julio

Reputation: 5308

If you want to remove all lines but the ones that end with (4numbers) you may try with this:

^(?!.*\(\d{4}\)\h*$).*(?:\r?\n|\z)

Replace by: (nothing)

See demo

Upvotes: 0

Poul Bak
Poul Bak

Reputation: 10929

The following RegEx maches the 2 lines with brackets containing 4 numbers:

.*?\(\d{4}\)\s*

It starts matching anything at start zero or more times (non greedy), then it matches a start bracket followed by 4 numbers. Finally ending White Space and new line.

Upvotes: 0

D.B.
D.B.

Reputation: 4713

I suggest using the Search > Mark feature. Use a pattern like \(\d{4}\) and check the "Bookmark Line" option then click "Mark All". Then use Search > Bookmark > Remove Unmarked Lines. This will remove all lines except the ones that have matched your pattern.


Note: If it's possible to have parentheses with 4 digits within your other lines you could add $ to the end of the expression to ensure that the pattern only matches the end of the line. E.g. more text (1234) and other stuff would be matched by the pattern I gave above but if you use pattern \(\d{4}\)$ it will no longer match.

If you want to be even more specific with your pattern by looking for those lines with only uppercase letters and spaces followed by parentheses with 4 digits inside where the parentheses are at the end of the line, then you could use a pattern like this: [A-Z ]+\(\d{4}\)$


Sample input:

SOME RANDOM TITLE IN CAPS (2008)
text text text
more text 
...
SOME OTHER RANDOM TITLE IN CAPS (2010)

Here is how to mark the lines:

enter image description here

After clicking "Mark All" here is what you see:

enter image description here

Now use Search > Bookmark > Remove Unmarked Lines and you get this:

enter image description here

Upvotes: 3

Related Questions