Reputation: 7223
How can a text string be turned into UTF-8 encoded bytes using Bash and/or common Linux command line utilities? For example, in Python one would do:
"Six of one, ½ dozen of the other".encode('utf-8')
b'Six of one, \xc2\xbd dozen of the other'
Is there a way to do this in pure Bash:
STR="Six of one, ½ dozen of the other"
<utility_or_bash_command_here> --encoding='utf-8' $STR
'Six of one, \xc2\xbd dozen of the other'
Upvotes: 3
Views: 6839
Reputation: 5576
I adapted Machinexa's nice answer a little for my needs
encoding="utf-8"
is the default so no need to passset
not a list
or concatenated bytestringalias encode='python3 -c "import sys; enc = sys.stdin.read().encode(); print(set(enc))"'
So then I can get a set without repetition:
printf "hell0\x0\nworld\n:-)\x0:-(\n" | \
grep -a "[[:cntrl:]]" -o | \
perl -pe 's/([^x\0-\x7f])/"\\x" . sprintf "%x", ord $1/ge' | \
encode
⇣
{b'\x00'}
and then if you wanted to drop the Python byte repr b''
and the backslash:
alias encode='python3 -c "from sys import stdin; encoded = stdin.read().encode(\"utf-8\"); s = set(encoded.splitlines()[:-1]); print({repr(char)[3:-1] for char in s})"'
which for the previous command gives {'x00'}
instead
Upvotes: 0
Reputation: 599
Python to the rescue!
alias encode='python3 -c "from sys import stdin; print(stdin.read().encode(\"utf-8\"))"'
root@kali-linux:~# echo "½ " | encode
b'\xc2\xbd \n'
Also, you can remove b''
with some sed/awk thingy if you want.
Upvotes: 4