Reputation: 4607
I am generating CSV files. Occasionally the data source will pass along characters with accents etc... that I would like to strip out. Is there a reasonably straightforward way to detect and strip out UTF-8 characters?
Upvotes: 0
Views: 98
Reputation: 52822
If you're sure you're getting UTF-8 as input, use iconv to convert the values to the encoding you're using in your output - detecting UTF-8 chars isn't failsafe (as the values are valid iso-8859-1 characters as well (or all 8 bit encodings, really).
If you just want to use the regular ascii set of values (byte-values 0 - 127), you can let iconv convert to the 'ascii' encoding and transliterate:
iconv("utf-8", "ascii//TRANSLIT", "Hei og hå")
will result in
hei og ha
being returned.
Upvotes: 1
Reputation: 33512
utf8_decode($string)
This can however garble some characters which are available in utf-8 but not in iso88591
Upvotes: 0