missingfaktor
missingfaktor

Reputation: 92026

Right way to deal with Unicode BOM in a text file

I am reading a text file in my program which contains some Unicode BOM character \ufeff/65279 in places. This presents several issues in further parsing.

Right now I am detecting and filtering these characters myself but would like to know if Java standard library or Guava has a way to do this more cleanly.

Upvotes: 4

Views: 6721

Answers (1)

Boris the Spider
Boris the Spider

Reputation: 61148

There is no built in way of dealing with a (UTF-8) BOM in Java or, indeed, in Guava.

There is currently a bug report on the Guava website about dealing with a BOM in Guava IO.

There are several SO posts (here and here) on how to detect/skip the BOM while reading a file in plain Java.

Your BOM (\ufeff) seems to be UTF-16 which, according to the same Guava report should be dealt with automatically by Java. This SO post seems suggest the same.

Upvotes: 10

Related Questions