Reputation: 21
I have over 2000 aspx documents that all hold the same heading that I need to remove:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<HTML lang="en">
<HEAD>
<TITLE>External Reference Investopedia</TITLE>
<META NAME="author" CONTENT="DERCHEC">
</HEAD>
<BODY>
<A NAME="topofpagebibliographyitem2aspx"></A>
Both the <TITLE>
and <A>
tag change in every file.
I need some help creating a regular expression that will select all the above text for me. I am currently using TextCrawler to work through these document in a batch. If better tools and methods are out there. Please let me know.
Regards,
CD
Upvotes: 1
Views: 121
Reputation: 30152
Use visual studio find and replace in files. In your find options choose that you want to use regular expressions (its a checkbox)
Find:
{\<Title>{.*}\</title\>}
Replace with nothing - IE leave it blank. This should get you started : )
Option 2 - download ultraedit and do a find and replace in files on the text block - done : )
Upvotes: 1
Reputation: 6126
If the bit you want to remove always ends with the </A>
tag. The you could just use a normal string split function in any language.
Upvotes: 0
Reputation: 30922
Simple! The regular expression will be exactly the same text you need to remove. So if you want to match:
<HTML lang="en">
your regular expression will be:
<HTML lang="en">
The only time you'll have a problem is when you have a character which has a reserved meaning, in that instance you just need to prefix with a \ .
So if you need to match a question mark (?) the regex would be \?
Upvotes: 0