Reputation: 41
I'm trying to extract the "email" with this code
const regex3 = /Email',\r\n value: '([^']*)',/gm;
var content3 = fs.readFileSync('message.txt')
let m3;
while ((m3 = regex3.exec(content)) !== null) {
// This is necessary to avoid infinite loops with zero-width matches
if (m3.index === regex3.lastIndex) {
regex3.lastIndex++;
}
// The result can be accessed through the `m`-variable.
m3.forEach((match, groupIndex) => {
fs.appendFileSync('messagematch.txt', m3[1] + '\n');
});
}
From this file
},
MessageEmbedField {
embed: [Circular *2],
name: 'Email',
value: '[email protected]',
inline: true
},
MessageE
The regex code works on notepad, but doesn't on my script.. what I'm missing?
Upvotes: 2
Views: 93
Reputation: 163297
what I'm missing?
\r\n
to match a Windows style line break but you can make the \r
optional to also match a Unix style. See this page about line break characters.var content3
but you use it like regex3.exec(content)
You could use \s+
instead of hardcoding the number of spaces but \s
can also match a newline.
If you want to match whitespaces without a newline you could use a negated character class [^\S\r\n]
to match any char except a non whitespace char and a newline.
'Email',\r?\n[^\S\r\n]+value:[^\S\r\n]+'([^\s@']+@[^\s@']+)'
'Email',
Match literally\r?\n
Match a newline[^\S\r\n]+
Match 1+ whitespace chars except newlinesvalue:
Match literally[^\S\r\n]+'
Match 1+ whitespace chars except newlines and '
(
Capture group 1
([^\s@']+@[^\s@']+'
Match an email like format)'
Close group 1 and match '
const regex3 = /'Email',\r?\n[^\S\r\n]+value:[^\S\r\n]+'([^\s@']+@[^\s@']+)'/g;
var content3 = ` },
MessageEmbedField {
embed: [Circular *2],
name: 'Email',
value: '[email protected]',
inline: true
},
MessageE `;
let m3;
while ((m3 = regex3.exec(content3)) !== null) {
// This is necessary to avoid infinite loops with zero-width matches
if (m3.index === regex3.lastIndex) {
regex3.lastIndex++;
}
console.log(m3[1]);
}
Upvotes: 1
Reputation: 27723
Maybe, try your expression on s
(single line) mode:
/Email'\s*,\s*value:\s*'([^'\r\n]*)'/gs
const regex = /Email'\s*,\s*value:\s*'([^'\r\n]*)'/gs;
const str = ` },
MessageEmbedField {
embed: [Circular *2],
name: 'Email',
value: '[email protected]',
inline: true
},
MessageE `;
let m;
while ((m = regex.exec(str)) !== null) {
// This is necessary to avoid infinite loops with zero-width matches
if (m.index === regex.lastIndex) {
regex.lastIndex++;
}
// The result can be accessed through the `m`-variable.
m.forEach((match, groupIndex) => {
console.log(`Found match, group ${groupIndex}: ${match}`);
});
}
If you wish to simplify/modify/explore the expression, it's been explained on the top right panel of regex101.com. If you'd like, you can also watch in this link, how it would match against some sample inputs.
jex.im visualizes regular expressions:
Upvotes: 0
Reputation: 247
I suggest changing your regex in a few ways to make it more robust and fault tolerant.
First, include the initial single-quote in email to avoid accidentally catching other fields where someone may have put the word "Email" as a value.
Second, use \r?\n
to capture both Windows and Unix-style line endings. I suspect this may be a large part of your issue, but can't be sure.
Third, use \s+
instead of specifically including a number of spaces. This will help to avoid problems caused by minor formatting changes.
The final regex would look like this:
const regex = /'Email',\r?\n\s+value: '([^']*)',/gm
Upvotes: 1
Reputation: 98921
You can try something like:
var test = `
},
MessageEmbedField {
embed: [Circular *2],
name: 'Email',
value: '[email protected]',
inline: true
},
Message
`;
var myregexp = /name: 'Email',\s+value: '(\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,}\b)',/img;
var match = myregexp.exec(test);
console.log(match[1]);
The regex above matches valid email addresses only, if you want to match anything (as it was), use:
var myregexp = /name: 'Email',\s+value: '([^']*)',/img;
Upvotes: 0