Special characters problem in variable/file. Same string, different format from different sources

Question

I have an encoding problem i can't wrap my head around. I'm also fairly new to linux and bash, so bear with me.

Context/Example:

cat file1.txt
Foo ヅ

#file -i file1.txt: text/plain; charset=utf-8
#Source: website curl

cat file2.txt
Foo ãƒ…

#file -i file1.txt: text/plain; charset=utf-8
#Source: mysql database query (result is the import of file1.txt)

If i insert file1.txt to my database, it shows "Foo ãƒ…". I've tried all kinds of conversions, collations, etc. It never shows the correct characters in mysql - but i'm fine with that.

The problem: I need to check if these strings are the same with an if statement:

var1=$(cat file1.txt )
var2=$(cat file2.txt )

if [ "$var1" != "$var2" ]; then
    #stuff is done
fi

I can't even remember all the things i've tried with iconv to convert either var1 or var2 to match one another so my if statement can work as intended. The only workaround i have is to import file1.txt to another table in my DB and extracting it again, but i'm working with a limited amount of DB connections.

Any tips on how to easier solve this, is greatly appreciated!

Special characters problem in variable/file. Same string, different format from different sources

Answers (1)

Related Questions