user791254
user791254

Reputation: 21

Help on Regular Expressions

I have over 2000 aspx documents that all hold the same heading that I need to remove:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<HTML lang="en">
<HEAD>

<TITLE>External Reference Investopedia</TITLE>
<META NAME="author" CONTENT="DERCHEC">
</HEAD>
<BODY>
<A NAME="topofpagebibliographyitem2aspx"></A>

Both the <TITLE> and <A> tag change in every file.

I need some help creating a regular expression that will select all the above text for me. I am currently using TextCrawler to work through these document in a batch. If better tools and methods are out there. Please let me know.

Regards,

CD

Upvotes: 1

Views: 121

Answers (3)

Adam Tuliper
Adam Tuliper

Reputation: 30152

Use visual studio find and replace in files. In your find options choose that you want to use regular expressions (its a checkbox)

Find:

{\<Title>{.*}\</title\>}

Replace with nothing - IE leave it blank. This should get you started : )

Option 2 - download ultraedit and do a find and replace in files on the text block - done : )

Upvotes: 1

Declan Cook
Declan Cook

Reputation: 6126

If the bit you want to remove always ends with the </A> tag. The you could just use a normal string split function in any language.

Upvotes: 0

m.edmondson
m.edmondson

Reputation: 30922

Simple! The regular expression will be exactly the same text you need to remove. So if you want to match:

<HTML lang="en">

your regular expression will be:

<HTML lang="en">

The only time you'll have a problem is when you have a character which has a reserved meaning, in that instance you just need to prefix with a \ .

So if you need to match a question mark (?) the regex would be \?

Upvotes: 0

Related Questions