andras.tim
andras.tim

Reputation: 2054

How can fix UTF-8 string usage in bash?

I have a bash script what contains several utf-8 string contained variables. These variables are used as parameters of a bash function in the sctript, what calls a cp and a python script with this parameters.

This script runs properly on my machine, but can not work on another one. I tried to debug with set -x and other stuffs, but I can not find the root cause, only this difference.

There is a minimalized example - like Plunker for JS ;)

  1. I have the following test.sh

    #!/bin/bash
    set -x
    
    function aaa() {
        echo "$1"
    }
    echo 'öüóőúéáűíÖÜÓŐÚÉÁŰÍ'
    aaa 'öüóőúéáűíÖÜÓŐÚÉÁŰÍ'
    
  2. I copy to my two hosts

  3. The good shows the following:

    + echo öüóőúéáűíÖÜÓŐÚÉÁŰÍ
    öüóőúéáűíÖÜÓŐÚÉÁŰÍ
    + aaa öüóőúéáűíÖÜÓŐÚÉÁŰÍ
    + echo öüóőúéáűíÖÜÓŐÚÉÁŰÍ
    öüóőúéáűíÖÜÓŐÚÉÁŰÍ
    
  4. However the bad shows this:

    + echo $'\303\266\303\274\303\263\305\221\303\272\303\251\303\241\305\261\303\255\303\226\303\234\303\223\305\220\303\232\303\211\303\201\305\260\303\215'
    öüóőúéáűíÖÜÓŐÚÉÁŰÍ
    + aaa $'\303\266\303\274\303\263\305\221\303\272\303\251\303\241\305\261\303\255\303\226\303\234\303\223\305\220\303\232\303\211\303\201\305\260\303\215'
    + echo $'\303\266\303\274\303\263\305\221\303\272\303\251\303\241\305\261\303\255\303\226\303\234\303\223\305\220\303\232\303\211\303\201\305\260\303\215'
    öüóőúéáűíÖÜÓŐÚÉÁŰÍ
    

Here is some details for debugging:

The good working machine is a Ubuntu Trusty with bash=4.2-2ubuntu2.6, and the bad working machine is a Ubuntu Precise with bash=4.3-7ubuntu1.5.

The locales are identical in both machines:

$ locale                                                                                                                                                                                                                                                           
LANG=en_US.UTF-8
LANGUAGE=
LC_CTYPE=en_US.UTF-8
LC_NUMERIC=en_US.UTF-8
LC_TIME=en_US.UTF-8
LC_COLLATE=en_US.UTF-8
LC_MONETARY=en_US.UTF-8
LC_MESSAGES=POSIX
LC_PAPER=en_US.UTF-8
LC_NAME=en_US.UTF-8
LC_ADDRESS=en_US.UTF-8
LC_TELEPHONE=en_US.UTF-8
LC_MEASUREMENT=en_US.UTF-8
LC_IDENTIFICATION=en_US.UTF-8
LC_ALL=

Updates

For more details, you can check this file on: https://github.com/andras-tim/callrecord-renamer/blob/master/callrecord-renamer.py

Update2

I have checked: this error caused independently from bash code. The .ini file encoding was bad... Sorry for all debugger helpers!

Upvotes: 0

Views: 2635

Answers (1)

that other guy
that other guy

Reputation: 123470

You are comparing the xtrace debugging output of set -x. You can not and should not expect bash's xtrace output to be in a certain format. If you want a specific format, you need to produce it yourself.

If you look at the non-debug output your script, it's identical on both machines.

Upvotes: 2

Related Questions