Reputation: 107
DLVARScriptTesting.rb:175:in `sub!': incompatible encoding regexp match (UTF-8 regexp with IBM437 string) (Encoding::CompatibilityError)
from DLVARScriptTesting.rb:175:in `block in parse_file'
from DLVARScriptTesting.rb:171:in `each'
from DLVARScriptTesting.rb:171:in `each_with_index'
from DLVARScriptTesting.rb:171:in `parse_file'
from DLVARScriptTesting.rb:371:in `<main>'
That's the full error.
Here are lines 171 & 175
File.readlines(testfile).each_with_index do |line, line_num|
line.sub!(/^\xEF\xBB\xBF/, '') if line_num == 0
I've tried setting the encoding to utf-8 but that isn't working Basically what the code is trying to do is remove xEF xBB xBF before a string if it's there.
Upvotes: 0
Views: 621
Reputation: 160551
... Basically what the code is trying to do is remove xEF xBB xBF before a string if it's there.
Why not ignore a regex and use a substring match and a substring slice? Something like this untested code:
line[0, 3] = '' if line[0, 3] == "\xef\xbb\xbf"
Regular expressions are useful but they're hardly the replacement for string slicing and dicing. And, they can result in major slowdowns in code if the engine gets confused and has to do a lot of backtracking. So use them when they're appropriate and use Benchmark or Fruity to test the use of a Regular Expression against an equivalent operation using the regular String processing.
Also, as a scalability thing, don't do:
File.readlines(testfile).each_with_index
readlines
reads an entire file into memory and converts it into an array. What's going to happen if your code moves from dev to production and the file being read suddenly goes from being 1K to 500MB? You'll see a major slowdown as Ruby tries to slurp the file and then convert it into an array in memory. In my world 500MB is small and multi-GB files are the norm.
Instead, use foreach
, as in File.foreach(test file).with_index
or better, don't bother with each_with_index
or with_index
and instead look at $.
which is the current line number of the file being read. foreach
reads the file line-by-line, which is just as fast or faster than slurping a file.
Upvotes: 1