Anas R.
Anas R.

Reputation: 367

Converting only non utf-8 files to utf-8

I have a set of md files, some of them are utf-8 encoded, and others are not (windows-1256 actually).

I want to convert only non-utf-8 files to utf-8.

The following script can partly do the job:

for file in *.md;
do
    iconv -f windows-1256 -t utf-8 "$file" -o "${file%.md}.🆕.md";
done

I still need to exclude the original utf-8 files from this process, (maybe using file command?). Try the following command to understand what I mean:

file --mime-encoding *

Notice that although file command isn't smart enough to detect the right character set of non-utf-8 files, It's enough in this case that it can distinguish between utf-8 and non-utf-8 files.

Thanks in advance for help.

Upvotes: 1

Views: 765

Answers (2)

Joona
Joona

Reputation: 11

As a beginner in shell commands I had some difficulties in using the proposed correct answer in practice. I wrote it in a general setting, all parts together, in the following way:

for file in *.txt; do if file --mime-encoding "$file" | grep -v -q utf-8 ; then iconv -f iso-8859-1 -t utf-8 "$file" > new/"$file"; fi ; done

"*.txt" skips all subfolders that could cause issues in iconv; instead of -o command, that I wasn't able to use, I used ">" and folder/$file structure allows me to keep original names of files.

Upvotes: 0

Joni
Joni

Reputation: 111389

You can use for example an if statement:

if file --mime-encoding "$file" | grep -v -q utf-8 ; then
    iconv -f windows-1256 -t utf-8 "$file" -o "${file%.md}.🆕.md";
fi

If grep doesn't find a match, it returns a status code indicating failure. The if statement tests the status code

Upvotes: 1

Related Questions