Reputation: 323
I am using the paste command in a bash loop to add new columns to a CSV-file. I would like to reuse the CSV-file. Currently I am using a temporary file to accomplish this:
while [ $i -le $max ]
do
# create text from grib2
wgrib2 -d 1.$(($i+1)) -no_header myGribFile.grb2 -text tmptxt.txt
#paste to temporary file
paste -d, existingfile.csv tmptxt.txt > tmpcsv.csv
#overwrite old csv with new csv
mv tmpcsv.csv existingfile.csv
((i++))
done
After adding some columns the copy is getting slow, because the file is becoming bigger and bigger (every tmptxt.txt
has about 2 MB, adding to approx 100 MB).
A tmptxt.txt
is a plain txt-file with one column and one value per row:
1
2
3
.
.
The existingfile.csv
would then be
1,1,x
2,2,y
3,3,z
.,.,.
.,.,.
Is there any way to use the paste command to add a column to an existing file? Or is there any other way?
Thanks
Upvotes: 6
Views: 8723
Reputation: 1239
Would it be feasible to split the operation in 2 ? One step for generating all the intermediate files; and another for generating all the final output file. The idea is to avoid rereading and rewriting over and over the final file.
The changes to the script would be something like this:
while [ $i -le $max ]
do
n=$(printf "%05d" $i) # to preserve lexical order if $max > 9
# create text from grib2
wgrib2 -d 1.$(($i+1)) -no_header myGribFile.grb2 -text tmptxt$n.txt
((i++))
done
#make final file
paste -d, existingfile.csv tmptxt[0-9]*.txt > tmpcsv.csv
#overwrite old csv with new csv
mv tmpcsv.csv existingfile.csv
Upvotes: 6
Reputation: 8398
Assuming the number of lines output by the program is constant and is equal to number of lines in existingfile.csv
(which should be the case since you are using paste
)
Disclaimer: I'm not exactly sure if this would speed things up (depending on whether io redirection >>
writes to the file exactly once or not). Anyway give it a try and let me know.
So the basic idea is
append the output in one go after the loop is done (note the change: wgrib now prints to -
which is stdout
)
use awk to move every linenum
rows (linenum
being the number of lines in existingfile.csv
) to the end to the first linenum
rows
Save to tempcsv.csv
(because I can't find a way to save in the same file)
rename to / overwrite existingfile.csv
.
while [ $i -le $max ]; do
# create text from grib2
wgrib2 -d 1.$(($i+1)) -no_header myGribFile.grb2 -text -
((i++))
done >> existingfile.csv
awk -v linenum=4 '
{ array[FNR%linenum]=array[FNR%linenum]","$0 }
END { for(i=1;i<linenum;i++) print array[i%linenum] }
' existingfile.csv > tempcsv.csv
mv tempcsv.csv existingfile.csv
If this is how I imagine it would work (internally), you should have 2 writes to existingfile.csv
instead of $max
number of writes. So hopefully this would speed things up.
Upvotes: 0