AnnaBea
AnnaBea

Reputation: 11

How to convert files in Unix using iconv?

I'm new to Bash scripting. I have a requirement to convert multiple input files in UTF-8 encoding to ISO 8859-1.

I am using the below command, which is working fine for the conversion part:

cd ${DIR_INPUT}/
for f in *.txt; do iconv -f UTF-8 -t ISO-8859-1 $f > ${DIR_LIST}/$f; done

However, when I don't have any text files in my input directory ($DIR_INPUT), it still creates an empty .txt file in my output directory ($DIR_LIST).

How can I prevent this from happening?

Upvotes: 1

Views: 1053

Answers (2)

Ruslan Osmanov
Ruslan Osmanov

Reputation: 21502

As @ghoti pointed out, in the absence of files matching the wildcard expression a* the expression itself becomes the result of pathname expansion. By default (when nullglob option is unset), a* is expanded to, literally, a*.

You can set nullglob option, of course. But then you should be aware of the fact that all subsequent pathname expansions will be affected, unless you unset the option after the loop.

I would rather use find command which has a clear interface (and, in my opinion, is less likely to perform implicit conversions as opposed to the Bash globbing). E.g.:

cmd='iconv --verbose -f UTF-8 -t ISO-8859-1 "$0" > "$1"/$(basename "$0")'

find "${DIR_INPUT}/" \
    -mindepth 1 \
    -maxdepth 1 \
    -type f \
    -name '*.txt' \
    -exec sh -c "$cmd" {} "${DIR_LIST}" \;

In the example above, $0 and $1 are positional arguments for the file path and ${DIR_LIST} respectively. The command is invoked via standard shell (sh) because of the need to refer to the file path {} twice. Although most modern implementations of find may handle multiple occurrences of {} correctly, the POSIX specification states:

If more than one argument containing the two characters "{}" is present, the behavior is unspecified.

As in the for loop, the -name pattern *.txt is evaluated as true if the basename of the current pathname matches the operand (*.txt) using the pattern matching notation. But, unlike the for loop, filename expansion do not apply as this is a matching operation, not an expansion.

Upvotes: 1

ghoti
ghoti

Reputation: 46876

The empty file *.txt is being created in your output directory because by default, bash expands an unmatched expansions to the literal string that you supplied. You can change this behaviour in a number of ways, but what you're probably looking for is shopt -s nullglob. Observe:

$ for i in a*; do echo "$i"; done
a*
$ shopt -s nullglob
$ for i in a*; do echo "$i"; done
$

You can find documentation about this in the bash man page under Pathname Expansion. Or here or here.

In your case, I'd probably rewrite this in this way:

shopt -s nullglob
for f in "$DIR_INPUT"/*.txt; do
  iconv -f UTF-8 -t ISO-8859-1 "$f" > "${DIR_LIST}/${f##*/}"
done

This avoids the need for the initial cd, and uses parameter expansion to strip off the path portion of $f for the output redirection. The nullglob will obviously eliminate the work being done on a nonexistent file.

Upvotes: 1

Related Questions