Nikky72
Nikky72

Reputation: 41

Extract email field with regex

I'm trying to extract the "email" with this code

const regex3 = /Email',\r\n      value: '([^']*)',/gm;
var content3 = fs.readFileSync('message.txt')
let m3;

while ((m3 = regex3.exec(content)) !== null) {
    // This is necessary to avoid infinite loops with zero-width matches
    if (m3.index === regex3.lastIndex) {
        regex3.lastIndex++;
    }

    // The result can be accessed through the `m`-variable.
    m3.forEach((match, groupIndex) => {
        fs.appendFileSync('messagematch.txt', m3[1] + '\n');
    });
}

From this file

 },
MessageEmbedField {
  embed: [Circular *2],
  name: 'Email',
  value: '[email protected]',
  inline: true
},
MessageE   

The regex code works on notepad, but doesn't on my script.. what I'm missing?

Upvotes: 2

Views: 93

Answers (4)

The fourth bird
The fourth bird

Reputation: 163297

what I'm missing?

  • You use \r\n to match a Windows style line break but you can make the \r optional to also match a Unix style. See this page about line break characters.
  • In your code you specify var content3 but you use it like regex3.exec(content)
  • Also the number of spaces in the question for the pattern and the examples data are different

You could use \s+ instead of hardcoding the number of spaces but \s can also match a newline.

If you want to match whitespaces without a newline you could use a negated character class [^\S\r\n] to match any char except a non whitespace char and a newline.

'Email',\r?\n[^\S\r\n]+value:[^\S\r\n]+'([^\s@']+@[^\s@']+)'
  • 'Email', Match literally
  • \r?\n Match a newline
  • [^\S\r\n]+ Match 1+ whitespace chars except newlines
  • value: Match literally
  • [^\S\r\n]+' Match 1+ whitespace chars except newlines and '
  • ( Capture group 1
    • ([^\s@']+@[^\s@']+' Match an email like format
  • )' Close group 1 and match '

Regex demo

const regex3 = /'Email',\r?\n[^\S\r\n]+value:[^\S\r\n]+'([^\s@']+@[^\s@']+)'/g;
var content3 = ` },
MessageEmbedField {
  embed: [Circular *2],
  name: 'Email',
  value: '[email protected]',
  inline: true
},
MessageE `;
let m3;

while ((m3 = regex3.exec(content3)) !== null) {
  // This is necessary to avoid infinite loops with zero-width matches
  if (m3.index === regex3.lastIndex) {
    regex3.lastIndex++;
  }
  console.log(m3[1]);
}

Upvotes: 1

Emma
Emma

Reputation: 27723

Maybe, try your expression on s (single line) mode:

/Email'\s*,\s*value:\s*'([^'\r\n]*)'/gs

Test

const regex = /Email'\s*,\s*value:\s*'([^'\r\n]*)'/gs;
const str = ` },
MessageEmbedField {
  embed: [Circular *2],
  name: 'Email',
  value: '[email protected]',
  inline: true
},
MessageE `;
let m;

while ((m = regex.exec(str)) !== null) {
    // This is necessary to avoid infinite loops with zero-width matches
    if (m.index === regex.lastIndex) {
        regex.lastIndex++;
    }
    
    // The result can be accessed through the `m`-variable.
    m.forEach((match, groupIndex) => {
        console.log(`Found match, group ${groupIndex}: ${match}`);
    });
}


If you wish to simplify/modify/explore the expression, it's been explained on the top right panel of regex101.com. If you'd like, you can also watch in this link, how it would match against some sample inputs.


RegEx Circuit

jex.im visualizes regular expressions:

enter image description here

Upvotes: 0

saritonin
saritonin

Reputation: 247

I suggest changing your regex in a few ways to make it more robust and fault tolerant.

First, include the initial single-quote in email to avoid accidentally catching other fields where someone may have put the word "Email" as a value.

Second, use \r?\n to capture both Windows and Unix-style line endings. I suspect this may be a large part of your issue, but can't be sure.

Third, use \s+ instead of specifically including a number of spaces. This will help to avoid problems caused by minor formatting changes.

The final regex would look like this:

const regex = /'Email',\r?\n\s+value: '([^']*)',/gm

Upvotes: 1

Pedro Lobito
Pedro Lobito

Reputation: 98921

You can try something like:

var test = `
    },
    MessageEmbedField {
      embed: [Circular *2],
      name: 'Email',
      value: '[email protected]',
      inline: true
    },
    Message
`;

var myregexp = /name: 'Email',\s+value: '(\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,}\b)',/img;
var match = myregexp.exec(test);
console.log(match[1]);


The regex above matches valid email addresses only, if you want to match anything (as it was), use:

var myregexp = /name: 'Email',\s+value: '([^']*)',/img;

Regex Demo & Explanation

Upvotes: 0

Related Questions