Unable to execute awk command in a function, but working directly in the shell

Question

I want to create a utility function for bash to remove duplicate lines. I am using function

function remove_empty_lines() {
  if ! command -v awk &> /dev/null
  then
    echo '[x] ERR: "awk" command not found'
    return
  fi

  if [[ -z "$1" ]]
  then
    echo "usage: remove_empty_lines  [--replace]"
    echo
    echo "Arguments:"
    echo -e "	--replace	 (Optional) If not passed, the result will be redirected to stdout"
    return
  fi

  if [[ ! -f "$1" ]]
  then
    echo "[x] ERR: \"$1\" file not found"
    return
  fi
  echo $0
  local CMD="awk '!seen[$0]++' $1"

  if [[ "$2" = '--reload' ]]
  then
    CMD+=" > $1"
  fi

  echo $CMD
}

If I am running the main awk command directly, it is working. But when i execute the same $CMD in the function, I am getting this error

$ remove_empty_lines app.js
/bin/bash
awk '!x[/bin/bash]++' app.js

Charles Duffy · Accepted Answer

The original code is broken in several ways:

When used with --reload, it would truncate the output file's contents before awk could ever read those contents (see How can I use a file in a command and redirect output to the same file without truncating it?)
It didn't ever actually run the command, and for the reasons described in BashFAQ #50, storing a shell command in a string is inherently buggy (one can work around some of those issues with eval; BashFAQ #48 describes why doing so introduces security bugs).
It wrote error messages (and other "diagnostic content") to stdout instead of stderr; this means that if your function's output was redirected to a file, you could never see its errors -- they'd end up jumbled into the output.
Error cases were handled with a return even in cases where $? would be zero; this means that return itself would return a zero/successful/truthy status, not revealing to the caller that any error had taken place.

Presumably the reason you were storing your output in CMD was to be able to perform a redirection conditionally, but that can be done other ways: Below, we always create a file descriptor out_fd, but point it to either stdout (when called without --reload), or to a temporary file (if called with --reload); if-and-only-if awk succeeds, we then move the temporary file over the output file, thus replacing it as an atomic operation.

remove_empty_lines() {
  local out_fd rc=0 tempfile=
  command -v awk &>/dev/null || { echo '[x] ERR: "awk" command not found' >&2; return 1; }

  if [[ -z "$1" ]]; then
    printf '%b
' >&2 \
      'usage: remove_empty_lines  [--replace]' \
      '' \
      'Arguments:' \
      '	--replace	(Optional) If not passed, the result will be redirected to stdout'
    return 1
  fi

  [[ -f "$1" ]] || { echo "[x] ERR: \"$1\" file not found" >&2; return 1; }

  if [ "$2" = --reload ]; then
    tempfile=$(mktemp -t "$1.XXXXXX") || return
    exec {out_fd}>"$tempfile" || { rc=$?; rm -f "$tempfile"; return "$rc"; }
  else
    exec {out_fd}>&1
  fi

  awk '!seen[$0]++' <"$1" >&$out_fd || { rc=$?; rm -f "$tempfile"; return "$rc"; }
  exec {out_fd}>&- # close our file descriptor

  if [[ $tempfile ]]; then
    mv -- "$tempfile" "$1" || return
  fi
}

Unable to execute awk command in a function, but working directly in the shell

Answers (2)

Related Questions