Riccardo
Riccardo

Reputation: 413

How can I append a string to a line when certain conditions are met?

I'm handling large .txt files and we are trying to identify which ones do not comply with the correct amount of characters in a line (80 characters top).

For the sake of this example let's say that we need 10 characters for every line, I need to append "(+Number of extra characters)" and "(-Number of missing characters)" for each line that does not have exactly 10 characters.

Here is what I have so far:

while IFS='' read -r line || [[ -n "$line" ]]; do
  if [[ "${#line}" -gt 10 ]]; then
    echo "Mo dan 10 D: ${#line}"
  elif [[ "${#line}" -lt 10 ]]; then
    echo "Less dan 10 D: ${#line}"
  fi

done < "$1"

I'm stuck in finding a way to append those two strings I'm echoing in the corresponding line so we can identify them.

I researched about awk and sed but haven't been able to properly loop through the entire .txt file, count the amount of characters in every line and append a string with the appropriate message.

Would appreciate some assistance in either shell scripting or as an awk or sed solution. Thank You.

Edit: This is an example input file (note white spaces also count as characters)

Line 1****
Line 2*****
Line 3*
Line 4****
Line 5****
Line 6**
Line 7****
Line 8********
Line 9****

This is the desired output

Line 1****
Line 2*****(+1)
Line 3*(-3)
Line 4****
Line 5****
Line 6**(-2)
Line 7****
Line 8********(+4)
Line 9****

Upvotes: 1

Views: 574

Answers (3)

mklement0
mklement0

Reputation: 437648

For performance reasons, using a shell loop to process the lines of a file is the wrong approach (unless the file is very small).

A text-processing utility such as awk is the much better choice:

awk -v targetLen=10 '
  diff = length($0) - targetLen { # input line ($0) does not have the expected length
    $0 = $0 "(" (diff > 0 ? "+" : "") diff ")" # append diff (with +, if positive)
  }
  1  # Print the (possibly modified) line.
' <<'EOF'  # sample input as a here-document
1234567890
123456789
123456789012
EOF

This yields:

1234567890
123456789(-1)
123456789012(+2)

Caveat: The BSD/macOS awk implementation is not locale-aware, so its length function counts bytes, which will only work as intended with ASCII-range characters.

Upvotes: 3

yosefrow
yosefrow

Reputation: 2268

I based my answer on your original script

#!/bin/bash

while IFS='' read -r line || [[ -n "$line" ]]; do
  nchars=${#line}
  target=10
  if [[ $nchars -gt $target ]]; then
          echo "$line+($((nchars-target)))"
  elif [[ $nchars -lt $target ]]; then
          echo "$line-($((target-nchars)))"
  else
      echo "$line"
  fi

done < "$1"

simply use it like this bash evalscript inputfile > outputfile

Upvotes: 0

Jack
Jack

Reputation: 6158

$ cat lines.in
Line 1****
Line 2*****
Line 3*
Line 4****
Line 5****
Line 6**
Line 7****
Line 8********
Line 9****

$ cat lines.sh
#!/bin/bash
mark=10
while IFS='' read -r line || [[ -n "$line" ]]; do
    diff=$(( ${#line} - mark ))
    if [ ${diff} -eq 0 ]; then
        echo "${line}"
    else
        printf "%s (%+d)\n" "${line}" "${diff}"
    fi
done < "$1"

$ ./lines.sh lines.in
Line 1****
Line 2***** (+1)
Line 3* (-3)
Line 4****
Line 5****
Line 6** (-2)
Line 7****
Line 8******** (+4)
Line 9****

Upvotes: 0

Related Questions