toughtalker
toughtalker

Reputation: 471

grep unicode 16 support

I use TextEdit on macosx created two files, same contents with different encodings, then

grep xxx filename_UTF-16

nothing

grep xxx filename_UTF-8

xxxxxxx xxxxxxyyyyyy

grep did not support UTF-16?

Upvotes: 7

Views: 2539

Answers (4)

kenorb
kenorb

Reputation: 166389

Define the following Ruby's shell function:

grep16() { ruby -e "puts File.open('$2', mode:'rb:BOM|UTF-16LE').readlines.grep(Regexp.new '$1'.encode(Encoding::UTF_16LE))"; }

Then use it as:

grep16 xxx filename_UTF-16

See: How to use Ruby's readlines.grep for UTF-16 files?

For more suggestions, check: grepping binary files and UTF16

Upvotes: -1

kenorb
kenorb

Reputation: 166389

Use ripgrep utility instead of grep which can support grepping UTF-16 files. Install by: brew install ripgrep.

Then run:

rg xxx filename_UTF-16

ripgrep supports searching files in text encodings other than UTF-8, such as UTF-16, latin-1, GBK, EUC-JP, Shift_JIS and more. (Some support for automatically detecting UTF-16 is provided. Other text encodings must be specifically specified with the -E/--encoding flag.)

Upvotes: 1

hmontoliu
hmontoliu

Reputation: 4019

iconv -f UTF-16 -t UTF-8 yourfile | grep xxx

Upvotes: 5

ninjalj
ninjalj

Reputation: 43688

You could always try converting first to utf-8:

iconv -f utf-16 -t utf-8 filename | grep xxxxx

Upvotes: 4

Related Questions