Reputation: 155
I have 430 HTML files of different organization's contact us web pages, I was given this files to extract emails from.
This regex simple code I came up with detects and finds emails throughout the files
\S*@\S*
My Problem
I'm trying to select everything besides the emails so I can use Notepad++'s "Replace All in All Opened Documents" function to delete everything besides the emails. Is this possible with regular expressions?
Is there anyway I can select everything outside of the regular expression provided above?
Upvotes: 4
Views: 2174
Reputation: 350270
Make sure you have a recent version of Notepad++ installed to have the necessary regex support:
Find what : (^|\s+)[^@]+(\s+|$)
Replace with : \n
🔘 Regular expression
The .
matches newline option does not influence the action.
Upvotes: 3
Reputation: 626851
You need to remove all text that does not match some pattern.
You need to match and capture the emails with a (...)
capture group and then you need to just match everything else.
Use a pattern like this: (
+ your_pattern
+ )|.
, and replace with $1
.
Or, use:
([^\s<>"]+@[^\s<>"]+)|.
or
(\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,4}\b)|.
Replace with: $1
Then, you might want to use Edit -> Blank Operations -> Remove Unnecessary Blank and EOL menu option.
Upvotes: 1