Reputation: 143
I have a big html file in which I have to delete each nth(6th,7th) occurrence of p element, how can I do it in notepad++ using pyscript or regex.
<p>Some text</p>
<p>Some text</p>
<p>Some text</p>
Thanks
Upvotes: 0
Views: 1380
Reputation: 6935
Go to Search > Replace menu (shortcut CTRL+H) and do the following:
Find what:
(?:<p>[^<]+<\/p>\r?\n){5}\K(?:<p>[^<]+<\/p>\r?\n){1,2}((?:<p>[^<]+<\/p>\r?\n)*)
Replace:
$1
Select radio button "Regular Expression"
Then press Replace All
.
You can test it online at regex101.
Please note that this will delete only the 6th and 7th element of a consecutive list of <p>
elements. If you want to delete in the same consecutive list the 12th and 13th element, then Zaheer is right, and you should probably use an HTML parser.
Upvotes: 1
Reputation: 89639
You can try this:
search : ((?:<p\b(?:[^<]+|<(?!/p>))*</p>(?:[^<]+|<(?!p\b))*){5})<p\b(?:[^<]+|<(?!/p>))*</p>((?:[^<]+|<(?!p\b))*)<p\b(?:[^<]+|<(?!/p>))*</p>
replace: $1$2
Upvotes: 0
Reputation: 28588
I would recommend you to use some programming language that read file and delete with a loop or other required logic. As regex in Notepad++ is very poor and if you deleted 6th and 7th occurrence your next 12th and 13th occurrence is now 10th and 11th and further it goes down accordingly.
Upvotes: 2