Reputation: 1555
so I saved to disk some objects using the following code (this is Ruby 1.9.2 on Windows BTW):
open('1.txt', "wb") { |file|
file.write(YAML::dump( results))
}
Now I'm trying to get back that data, but get 'invalid byte sequence in UTF-8 (ArgumentError)'. I've tryed everything I could think of to save the data in different format, but no luck. For example
open('1.txt', 'rb'){|f| a1 = YAML::load(f.read)}
a1.each do |a|
JSON.generate(a)
end
results in:
C:/m/ruby-1.9.2-p0-i386-mingw32/lib/ruby/1.9.1/json/common.rb:212:in `match':
invalid byte sequence
in UTF-8 (ArgumentError)
from C:/m/ruby-1.9.2-p0-i386-mingw32/lib/ruby/1.9.1/json/common.rb:212:in `generate'
from C:/m/ruby-1.9.2-p0-i386-mingw32/lib/ruby/1.9.1/json/common.rb:212:in `generate'
from merge3.rb:31:in `block in <main>'
from merge3.rb:29:in `each'
from merge3.rb:29:in `<main>'
What can I do?
EDIT: from the file:
---
- !ruby/object:Product
name: HSF
- !ruby/object:Product
name: "almer\xA2n"
The 1st product works OK, but the 2nd gives the exception.
Upvotes: 0
Views: 630
Reputation: 160571
I'm not sure if this is what you're after, but currently your YAML file looks like:
---
- !ruby/object:Product
name: HSF
- !ruby/object:Product
name: "almer\xA2n"
If you remove the !ruby/object:Product
from the array lines you'll get an array of hashes:
---
- name: HSF
- name: "almer\xA2n"
results in:
YAML::load_file('test.yaml') #=> [{"name"=>"HSF"}, {"name"=>"almer\xA2n"}]
If I print the second element's value when my terminal is set to Windows character sets I see the cent sign. So, if you're trying to regain access to the data all you have to do is a bit of manipulation of the data file.
Upvotes: 0
Reputation: 211670
This is probably your encoding being wrong. You could try this:
Encoding.default_external = 'BINARY'
This should read in the file raw, not interpreted as UTF-8. You are presumably using some kind of ISO-8859-1 accent.
Upvotes: 1
Reputation: 369526
You need to read the file back in using the same encoding you used to write it out, obviously. Since you don't specify an encoding in either case, you will basically end up with an environment-dependent encoding outside of your control, which is why it is never a good idea to not specify an encoding.
The snippet you posted is clearly not valid UTF-8, so the fact that you get an exception is perfectly appropriate.
Upvotes: 0