bennos
bennos

Reputation: 323

Paste column to existing file in a loop

I am using the paste command in a bash loop to add new columns to a CSV-file. I would like to reuse the CSV-file. Currently I am using a temporary file to accomplish this:

while [ $i -le $max ]
    do
        # create text from grib2
        wgrib2 -d 1.$(($i+1)) -no_header myGribFile.grb2 -text tmptxt.txt

        #paste to temporary file
        paste -d, existingfile.csv tmptxt.txt > tmpcsv.csv  

        #overwrite old csv with new csv
        mv tmpcsv.csv existingfile.csv

        ((i++))
    done

After adding some columns the copy is getting slow, because the file is becoming bigger and bigger (every tmptxt.txt has about 2 MB, adding to approx 100 MB).

A tmptxt.txt is a plain txt-file with one column and one value per row:

1
2
3
.
.

The existingfile.csv would then be

1,1,x
2,2,y
3,3,z
.,.,.
.,.,.

Is there any way to use the paste command to add a column to an existing file? Or is there any other way?

Thanks

Upvotes: 6

Views: 8723

Answers (2)

German Garcia
German Garcia

Reputation: 1239

Would it be feasible to split the operation in 2 ? One step for generating all the intermediate files; and another for generating all the final output file. The idea is to avoid rereading and rewriting over and over the final file.

The changes to the script would be something like this:

while [ $i -le $max ]
do
    n=$(printf "%05d" $i)    # to preserve lexical order if $max > 9
    # create text from grib2
    wgrib2 -d 1.$(($i+1)) -no_header myGribFile.grb2 -text tmptxt$n.txt
    ((i++))
done

#make final file
paste -d, existingfile.csv tmptxt[0-9]*.txt > tmpcsv.csv  

#overwrite old csv with new csv
mv tmpcsv.csv existingfile.csv

Upvotes: 6

doubleDown
doubleDown

Reputation: 8398

Assuming the number of lines output by the program is constant and is equal to number of lines in existingfile.csv (which should be the case since you are using paste)

Disclaimer: I'm not exactly sure if this would speed things up (depending on whether io redirection >> writes to the file exactly once or not). Anyway give it a try and let me know.

So the basic idea is

  1. append the output in one go after the loop is done (note the change: wgrib now prints to - which is stdout)

  2. use awk to move every linenum rows (linenum being the number of lines in existingfile.csv) to the end to the first linenum rows

    Save to tempcsv.csv (because I can't find a way to save in the same file)

  3. rename to / overwrite existingfile.csv

.

while [ $i -le $max ]; do
  # create text from grib2
  wgrib2 -d 1.$(($i+1)) -no_header myGribFile.grb2 -text -

  ((i++))
done >> existingfile.csv

awk -v linenum=4 '
  { array[FNR%linenum]=array[FNR%linenum]","$0 } 
  END { for(i=1;i<linenum;i++) print array[i%linenum] }
' existingfile.csv > tempcsv.csv

mv tempcsv.csv existingfile.csv

If this is how I imagine it would work (internally), you should have 2 writes to existingfile.csv instead of $max number of writes. So hopefully this would speed things up.

Upvotes: 0

Related Questions