Rub1
Rub1

Reputation: 3

Special characters problem in variable/file. Same string, different format from different sources

I have an encoding problem i can't wrap my head around. I'm also fairly new to linux and bash, so bear with me.

Context/Example:

cat file1.txt
Foo ヅ

#file -i file1.txt: text/plain; charset=utf-8
#Source: website curl
cat file2.txt
Foo ヅ

#file -i file1.txt: text/plain; charset=utf-8
#Source: mysql database query (result is the import of file1.txt)

If i insert file1.txt to my database, it shows "Foo ヅ". I've tried all kinds of conversions, collations, etc. It never shows the correct characters in mysql - but i'm fine with that.

The problem: I need to check if these strings are the same with an if statement:

var1=$(cat file1.txt )
var2=$(cat file2.txt )

if [ "$var1" != "$var2" ]; then
    #stuff is done
fi

I can't even remember all the things i've tried with iconv to convert either var1 or var2 to match one another so my if statement can work as intended. The only workaround i have is to import file1.txt to another table in my DB and extracting it again, but i'm working with a limited amount of DB connections.

Any tips on how to easier solve this, is greatly appreciated!

Upvotes: 0

Views: 66

Answers (1)

Rub1
Rub1

Reputation: 3

Thanks KamilCuk! The problem was the collation on the database itself (i didn't even know that was a thing).

Setting the database collation AND the table collation to utf8mb4_unicode_ci fixed the encoding on the import, and therefore the whole problem is solved.

Upvotes: 0

Related Questions