Reputation: 21
I have been sent a dataset that contains data for US, UK, France, and Germany product dictionaries. With the German data, I'm having trouble displaying accents, etc.
I've sprayed the data as ASCII and UTF8.
I've defined my record structure as
gbrec := RECORD
STRING5 CountryId;
INTEGER8 ProductId;
INTEGER8 ABV;
UTF8_de ProductDescription;
INTEGER8 ProductItemId;
INTEGER MultiBuys;
STRING UomDescription;
I define the dataset as
ProductDictionary := Project(DISTRIBUTE(DATASET('~cga::ml_fullproductextract_20220808_UTF.txt', gbrec ,CSV(SEPARATOR('\t'))))(std.uni.ToUpperCase(ProductDescription[1..4]) != 'ANY ' AND std.uni.ToUpperCase(CGA_GenealogyLvl3Desc) NOT IN ['NA_BRAND FAMILY']),
I have used the UTF and ASCII versions with no joy. The data is displayed below.
Do you have any advice or suggestions? I've looked over posted on the original forum which is where I got these ideas from.
Any help would be appreciated.
Thanks
Upvotes: 1
Views: 44
Reputation: 780
David,
I would start by going back to the spray. ASCII will never work, so UTF8 would be my first choice. But since that does not work, I would next go back to take a look at the raw data in a Hex editor to see exactly what I was dealing with. IOW, it is some form of Unicode, but which exactly? Perhaps you could ask the data supplier?
HTH,
Richard
Upvotes: 2