Reputation: 367
I have a set of md
files, some of them are utf-8 encoded, and others are not (windows-1256
actually).
I want to convert only non-utf-8 files to utf-8.
The following script can partly do the job:
for file in *.md;
do
iconv -f windows-1256 -t utf-8 "$file" -o "${file%.md}.🆕.md";
done
I still need to exclude the original utf-8 files from this process, (maybe using file
command?). Try the following command to understand what I mean:
file --mime-encoding *
Notice that although file
command isn't smart enough to detect the right character set of non-utf-8 files, It's enough in this case that it can distinguish between utf-8 and non-utf-8 files.
Thanks in advance for help.
Upvotes: 1
Views: 765
Reputation: 11
As a beginner in shell commands I had some difficulties in using the proposed correct answer in practice. I wrote it in a general setting, all parts together, in the following way:
for file in *.txt; do if file --mime-encoding "$file" | grep -v -q utf-8 ; then iconv -f iso-8859-1 -t utf-8 "$file" > new/"$file"; fi ; done
"*.txt" skips all subfolders that could cause issues in iconv; instead of -o command, that I wasn't able to use, I used ">" and folder/$file structure allows me to keep original names of files.
Upvotes: 0
Reputation: 111389
You can use for example an if
statement:
if file --mime-encoding "$file" | grep -v -q utf-8 ; then
iconv -f windows-1256 -t utf-8 "$file" -o "${file%.md}.🆕.md";
fi
If grep
doesn't find a match, it returns a status code indicating failure. The if
statement tests the status code
Upvotes: 1