Reputation: 844
I have a text file that can have X number of fields, each separated by a comma. In my script I reading line by line, checking how many fields have been populated on that line and determining how many commas i need to append to the end of that line to represent all the fields. For instance a file looks like this:
Address,nbItems,item1,item2,item3,item4,item5,item6,item7
2325988023,7,1,2,3,4,5,6,7
2327036284,5,1,2,3,4,5
2326168436,4,1,2,3,4
Should become this:
Address,nbItems,item1,item2,item3,item4,item5,item6,item7
2325988023,7,1,2,3,4,5,6,7
2327036284,5,1,2,3,4,5,,
2326168436,4,1,2,3,4,,,
My script below works, but it seems terribly inefficient. Is it the reading line by line that has a hard time on large files? Is it the sed that causes the slowdown? Better way to do this?
#!/bin/bash
lineNum=0
numFields=`head -1 File.txt | egrep -o "," | wc -l`
cat File.txt | while read LINE
do
lineNum=`expr 1 + $lineNum`
num=`echo $LINE | egrep -o "," | wc -l`
needed=$(( numFields - num ))
for (( i=0 ; i < $needed ; i++ ))
do
sed -i "${lineNum}s/$/,/" File.txt
done
done
Upvotes: 3
Views: 5076
Reputation: 531808
Here's a full bash
solution.
(
IFS=","
read hdrLine
echo "$hdrLine"
read -a header <<< "$hdrLine"
numFields="${#header[@]}"
while read -a line; do
pad=${#line[@]}
while (( pad < numFields )); do
line[pad++]=
done
echo "${line[*]}"
done
) < File.txt > newFile.txt
mv newFile.txt File.txt
The awk
solution is far better; this is best viewed as a bash
demo.
Upvotes: 0
Reputation: 9936
This type of thing is usually best done with a language like awk
, for example:
awk 'NR==1{n=NF}{$n=$n}1' FS=, OFS=, file
Upvotes: 11