Reputation: 1542
VB6 program. I have a UTF-8 encoded file (not created by me) that I read values from. I use FileSystemObject.ReadLine() to read the file. If I read that into a String or Variant data type, and look at the value in the debugger, it is shown in ANSI with 2 ugly characters where the UTF-8 Spanish "n" is. I can write that very string back out using FSO.WriteLine() and when I open the file in NotePad++, it recognizes it is UTF-8 encoding and correctly shows that string's character. If I put that value in a TextBox, again, it has the ugly Ansi characters where the UTF-8 "n" is supposed to be.
If I read that same value by ID from my MSAccess database with UTF-8 encoding, put it in a String data type, it displays correctly as UTF-8 in the debugger, and if I then assign that to a TextBox.Text, it shows with the UTF-8 encoding in the TextBox.
So the problem appears to be what is getting assigned to the String data type and how that String recognizes the encoding of the data that was just handed to it.
What am I missing? Why does the String variable recognize the UTF-8 encoding when the data is assigned to it from a DAO recordset object but not when read from a UTF-8 encoded file with the same value. If I open that file in NotePadd++, it seems to know and display the characters correctly.
Thanks much for any assistance.
Upvotes: 2
Views: 3440
Reputation: 1542
Thanks for the assistance all. The issue is that FileSystemObject cannot read UTF-8 files. It is answered in another post here: Read utf-8 text file in vbscript
I was unaware of that point and really my understanding of encoding overall was quite weak. A bit better understanding now.
The solution offered above was to use ADODB.Stream object to read utf-8 files.
But, I want the CSV file imported into my Access database. After hours of searching, here is the code that does it.
db.Execute "Select * Into Test1 From [Text;CharacterSet=65001;FMT=CSVDelimited;HDR=YES;DATABASE=C:\Test\].[utf8-test.csv]"
Hope this helps others.
Upvotes: 2