Encoding issues HPCC

Question

I have been sent a dataset that contains data for US, UK, France, and Germany product dictionaries. With the German data, I'm having trouble displaying accents, etc.

I've sprayed the data as ASCII and UTF8.

I've defined my record structure as

gbrec := RECORD
STRING5 CountryId;
INTEGER8 ProductId;
INTEGER8 ABV;
UTF8_de ProductDescription;
INTEGER8 ProductItemId;
INTEGER MultiBuys;
STRING UomDescription;

I define the dataset as

ProductDictionary := Project(DISTRIBUTE(DATASET('~cga::ml_fullproductextract_20220808_UTF.txt', gbrec ,CSV(SEPARATOR('	'))))(std.uni.ToUpperCase(ProductDescription[1..4]) != 'ANY ' AND std.uni.ToUpperCase(CGA_GenealogyLvl3Desc) NOT IN ['NA_BRAND FAMILY']),

I have used the UTF and ASCII versions with no joy. The data is displayed below.

VS Code Image

Do you have any advice or suggestions? I've looked over posted on the original forum which is where I got these ideas from.

Any help would be appreciated.

Thanks

Problem Data

Encoding issues HPCC

Answers (1)

Related Questions