user1575921
user1575921

Reputation: 1088

regex match slow

I try to let use ajax upload file over 2mb with nodejs app,
client side I use FileReader api to save base64 then pass through FormData.

My problem is server side code like below so slow, I put console.log try to find which part, when upload bigger size file, seems stuck at regex match..
any suggestion how to improve this?

https://regex101.com/r/qS2lB2/1

...
console.log(image.data_base64);  
// ' ...
var matches = image.data_base64.match(/^data:.+\/(.+);base64,(.*)$/);
console.log('done');  // slow

var fileExtension = matches[1];
var base64 = matches[2];
var buffer = new Buffer(base64, 'base64');

...
yield Promise.resolve( filesystem().writeFile(temporaryFilePath, buffer) );

Upvotes: 2

Views: 1637

Answers (3)

Laurel
Laurel

Reputation: 6173

Additional length means there's more string that the regex must travel through.

Testing your regex (using regex101.com, PHP mode) on strings starting with :

Characters added | Steps

0   | 63
1   | 68
2   | 73
10  | 113
100 | 563

Each additional character is 5 steps.

How to fix the regex

(based on characters added=100 taking 563 steps)

  • Your biggest problems are the .+s

    • Replacing the first one with .+? takes it down to 248 steps
    • Replacing the second with .+? takes it from 248 to 34 steps

The cause of performance issues

Catastrophic backtracking. .+ will eat up the entire string, and if it still needs to find more characters, it has to go back, releasing characters one by one. The .+? is lazy, meaning it will try to move on in the regex ASAP, consuming as few characters as possible.

Upvotes: 1

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626926

Just in case you still want to use a regex for some reason, the performance can be improved by replacing non-trailing .+ subpatterns with appropriate negated character classes that involve much less backtracking.

Use

/^data:[^\/]+\/([^;]+);base64,(.*)$/

See regex demo.

Explanation:

  • ^ - start of string
  • data: - literal char sequence data:
  • [^\/]+ - 1+ characters other than /
  • \/ - a literal slash
  • ([^;]+) - Group 1: 1+ characters other than ;
  • ;base64, - a literal char sequence ;base,
  • (.*) - Group 2: 0+ any characters but a newline
  • $ - end of string.

Upvotes: 2

mscdex
mscdex

Reputation: 106696

Sending a file base64-encoded in a multipart/form-data request is unnecessary. FileReader has a .readAsArrayBuffer() method that will give you the raw data (as an ArrayBuffer) that you can pass directly to formData.append().

Upvotes: 0

Related Questions