mightyWOZ
mightyWOZ

Reputation: 8315

String.match() function returns null even when there is a match

I am trying to match a regex with some data in a file, the match function however returns null even when the match clearly exists in the data.

I have tried same data and regex on RegExr and it shows a match.

following is the code

var fs = require('fs');
try {  
    var data = fs.readFileSync('File.txt', 'utf8');
    data = data.toString();
    var regex = /^(hi|hI|Hi|HI)\s[^dD].*?$/gm;
    var result = data.match(regex);
} catch(e) {
    console.log('Error:', e.stack);
}

and these are the contents of file

Hi Alex how are you doing
hI dave how are you doing
Good by Alex
hidden agenda
Alex greeted Martha by saying Hi Martha

I used same data on RegExr and it shows first line as a match. but when I run the above code on my machine the result variable remains null.

is there something that I am missing ?

following are some screen shots while i was debugging the code in vscode.

  1. contents of data variable enter image description here

  2. state of result variable. enter image description here

  3. result of JSON.stringify enter image description here

Edit. JSON.stringify results

I ran the program on cmd and shockingly the string has a space in front of it.

Upvotes: 2

Views: 1129

Answers (3)

mightyWOZ
mightyWOZ

Reputation: 8315

The problem has been solved with the help of the @vsemozhetbyt's answer and many helpful comments. I am adding this answer to help if anybody encounters the some problem in future.

why it happened ?

Because of some wrong encoding, in my particular case the File.txt was originally File.jsp and I changed its extension to .txt and saved it. after this I read the file.txt which contained same text as JSP and did some regex matching, it worked fine and there was no BOM in the file.

Problem appeared when I opened the file in notepad and replaced all its contents with 5 lines of text shown in the question and saved it.

what is BOM

Read this great article

How I removed BOM

I opened the file in binary mode in vim using

vim -b File.txt

and removed the first three chars (Bytes)

Upvotes: 3

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626747

When you read file in with fs.readFileSync the BOM is not stripped from the data read and it is up to the programmer to handle it. See fs.readFileSync(filename, 'utf8') doesn't strip BOM markers.

You may just use

data = data.replace(/^\uFEFF/, '')

This will remove the BOM if it is there, and then you may run your regex.

Note you do not see the BOM when opening text files in text editors, like VIM, Notepad, because they can handle BOM.

Upvotes: 2

vsemozhebuty
vsemozhebuty

Reputation: 13782

The space in the JSON output seems to be BOM. If so, data.codePointAt(0) should be 65279.

Upvotes: 2

Related Questions