Reputation: 413
I'm handling large .txt files and we are trying to identify which ones do not comply with the correct amount of characters in a line (80 characters top).
For the sake of this example let's say that we need 10 characters for every line, I need to append "(+Number of extra characters)" and "(-Number of missing characters)" for each line that does not have exactly 10 characters.
Here is what I have so far:
while IFS='' read -r line || [[ -n "$line" ]]; do
if [[ "${#line}" -gt 10 ]]; then
echo "Mo dan 10 D: ${#line}"
elif [[ "${#line}" -lt 10 ]]; then
echo "Less dan 10 D: ${#line}"
fi
done < "$1"
I'm stuck in finding a way to append those two strings I'm echoing in the corresponding line so we can identify them.
I researched about awk and sed but haven't been able to properly loop through the entire .txt file, count the amount of characters in every line and append a string with the appropriate message.
Would appreciate some assistance in either shell scripting or as an awk or sed solution. Thank You.
Edit: This is an example input file (note white spaces also count as characters)
Line 1****
Line 2*****
Line 3*
Line 4****
Line 5****
Line 6**
Line 7****
Line 8********
Line 9****
This is the desired output
Line 1****
Line 2*****(+1)
Line 3*(-3)
Line 4****
Line 5****
Line 6**(-2)
Line 7****
Line 8********(+4)
Line 9****
Upvotes: 1
Views: 574
Reputation: 437648
For performance reasons, using a shell loop to process the lines of a file is the wrong approach (unless the file is very small).
A text-processing utility such as awk
is the much better choice:
awk -v targetLen=10 '
diff = length($0) - targetLen { # input line ($0) does not have the expected length
$0 = $0 "(" (diff > 0 ? "+" : "") diff ")" # append diff (with +, if positive)
}
1 # Print the (possibly modified) line.
' <<'EOF' # sample input as a here-document
1234567890
123456789
123456789012
EOF
This yields:
1234567890
123456789(-1)
123456789012(+2)
Caveat: The BSD/macOS awk
implementation is not locale-aware, so its length
function counts bytes, which will only work as intended with ASCII-range characters.
Upvotes: 3
Reputation: 2268
I based my answer on your original script
#!/bin/bash
while IFS='' read -r line || [[ -n "$line" ]]; do
nchars=${#line}
target=10
if [[ $nchars -gt $target ]]; then
echo "$line+($((nchars-target)))"
elif [[ $nchars -lt $target ]]; then
echo "$line-($((target-nchars)))"
else
echo "$line"
fi
done < "$1"
simply use it like this bash evalscript inputfile > outputfile
Upvotes: 0
Reputation: 6158
$ cat lines.in
Line 1****
Line 2*****
Line 3*
Line 4****
Line 5****
Line 6**
Line 7****
Line 8********
Line 9****
$ cat lines.sh
#!/bin/bash
mark=10
while IFS='' read -r line || [[ -n "$line" ]]; do
diff=$(( ${#line} - mark ))
if [ ${diff} -eq 0 ]; then
echo "${line}"
else
printf "%s (%+d)\n" "${line}" "${diff}"
fi
done < "$1"
$ ./lines.sh lines.in
Line 1****
Line 2***** (+1)
Line 3* (-3)
Line 4****
Line 5****
Line 6** (-2)
Line 7****
Line 8******** (+4)
Line 9****
Upvotes: 0