Reputation: 35
Im not of a programer myself but developed a shellscript to read a positional file and based on a single letter specified at position 16 copy all the line to another file.
Exemple:
INPUT FILE
003402841000011A10CNPJ08963394000195
003402841000041B20CNPJ08963394000195 16012020XX5313720087903007
003402841000011A10CNPJ08963394000195
003402841000041B20CNPJ08963394000195 16012020XX5313720087903007
OUTPUT FILE A
003402841000011A10CNPJ08963394000195
003402841000011A10CNPJ08963394000195
OUTPUT FILE B
003402841000041B20CNPJ08963394000195 16012020XX5313720087903007
003402841000041B20CNPJ08963394000195 16012020XX5313720087903007
The code i current have:
#!/usr/bin/env bash
ARQ_IN="$1";
DIR_OUT="C:/Users/etc/etc/";
while IFS= read -r line || [[ -n "$line" ]];
do
SUBSTRING=$(echo $line| cut -c16);
if [ $SUBSTRING == "A" ]
then
echo "$line" >> "$DIR_OUT"arqA.txt;
else
if [ $SUBSTRING == "B" ]
then
echo "$line" >> "$DIR_OUT"arqB.txt;
else
if [ $SUBSTRING == "K" ]
then
echo "$line" >> "$DIR_OUT"arqK.txt;
else
if [ $SUBSTRING == "1" ]
then
echo "$line" >> "$DIR_OUT"arq1.txt;
else
fi
fi
fi
fi
done < "$ARQ_IN"
Although this code works, it doesn't work in the speed that i need, the INPUT FILE has around 400k registers.
Can someone help me to write a new code or improve this one?
Upvotes: 2
Views: 132
Reputation: 246807
Yes, bash while-read loops can be pretty slow, plus there's no need to call out to cut
to get a substring. Try this:
while IFS= read -r line || [[ -n "$line" ]]; do
# the offset is zero-based, so use 15 not 16
letter=${line:15:1}
case "$letter" in
[ABK1]) echo "$line" >> "${DIR_OUT}arq${letter}.txt" ;;
esac
done < "$ARQ_IN"
With cascading if-else if, use elif
if some condition; then
some action
elif some other condition; then
some other action
...
else
some default action
fi
Upvotes: 2
Reputation: 133518
This is a job for awk
, could you please try following, though I haven't tested it with huge dataset but it should be definitely faster than OP's current approach. To add abosulte path before output file name we could pass shell variable into awk
variable and get it in outputFile
variable here.
awk '
{
close(outputFile)
outputFile=("output_file_"substr($0,16,1))
print >> (outputFile)
}
' Input_file
With complete folder path to save files use following, please change /tmp/test/
with your actual path here.
DIR_OUT="/tmp/test/"
awk -v folder="${DIR_OUT}" '
{
close(outputFile)
outputFile=(folder"arq"substr($0,16,1)".txt")
print >> (outputFile)
}
' Input_file
Upvotes: 4