Radioactive Coffe
Radioactive Coffe

Reputation: 155

Regex To Exclude Email-Expression

I have 430 HTML files of different organization's contact us web pages, I was given this files to extract emails from.

This regex simple code I came up with detects and finds emails throughout the files

\S*@\S*

My Problem

I'm trying to select everything besides the emails so I can use Notepad++'s "Replace All in All Opened Documents" function to delete everything besides the emails. Is this possible with regular expressions?

Is there anyway I can select everything outside of the regular expression provided above?

Upvotes: 4

Views: 2174

Answers (2)

trincot
trincot

Reputation: 350270

Make sure you have a recent version of Notepad++ installed to have the necessary regex support:

Find what : (^|\s+)[^@]+(\s+|$)
Replace with : \n
🔘 Regular expression    

The . matches newline option does not influence the action.

Upvotes: 3

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626851

You need to remove all text that does not match some pattern.

You need to match and capture the emails with a (...) capture group and then you need to just match everything else.

Use a pattern like this: ( + your_pattern + )|., and replace with $1.

Or, use:

([^\s<>"]+@[^\s<>"]+)|.

or

(\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,4}\b)|.

Replace with: $1

Then, you might want to use Edit -> Blank Operations -> Remove Unnecessary Blank and EOL menu option.

Upvotes: 1

Related Questions