A S
A S

Reputation: 1235

replacing a bunch of lines in a bunch of files

Let's say I have some thousands of HTML files with some text inside 'em (articles, actually). Besides, let's say there are all sorts of scripts, styles, counters, other crap inside these HTMLs, somewhere above the actual text.

And my task is to replace everything that goes from the very beginning until a certain tag – i.e., we start with <head> and end with <div class="StoryGoesBelow"> with a clear

<html>
<head>
</head>
<body>

block.

Is there any regex way I can do this? Vim? Any other editor? Scripting language?

Thanks.

Upvotes: 0

Views: 57

Answers (1)

Tim Pietzcker
Tim Pietzcker

Reputation: 336108

The simplest regex for this would be (?s)\A.*?(?=<div class="StoryGoesBelow">) (assuming you want to keep the <div> tag). Replace that with the text from your question.

Explanation:

(?s)   # Allow the dot to match newlines
\A     # Anchor the search at the start of the string
.*?    # Match any number of characters, as few as possible
(?=<div class="StoryGoesBelow">)  # and stop right before this <div>

This will fail, of course, if the text <div class="StoryGoesBelow"> could also occur in a comment or a literal string somewhere above the actual tag.

Upvotes: 1

Related Questions