Reputation: 9592

Can I make this bash script that writes to multiple files run any faster?

I have a script that reads in lines from a file, takes the first column of each line, and appends to a file named that line (I am trying to write many different files, named $id.txt).

Is it possible to have a script that does anything faster than this (on a single-node machine)? Note that I use read -r and id="$(echo $line | awk '{print $1}')" because I have tab-separated fields and there are certain characters like backslashes in some fields that I want to keep.

    while read -r line
    do
        id="$(echo $line | awk '{print $1}')"
        echo "$line" >> $id.txt
    done < $1

Some characteristics of my input:

the output $id.txt files are not that large, usually a few hundred lines on average and up to a few thousand lines
the id's are actually already sorted and the lines come in continuous blocks, i.e.:

abc ...
abc ...
def ...
def ...
def ...
def ...
ghi ...
ghi ...

Upvotes: 1

Answers (2)

Christopher Neylan

Reputation: 8272

I'm guessing that your slowness is coming from doing $(echo $line | awk '{print $1}' for every line, which means the operating system needs to go through the work of creating two new processes for every line, made worse by awk being an interpreter. You should condense this into one script using something like awk (by itself) or Perl.

Upvotes: 1

Ignacio Vazquez-Abrams

Reputation: 798746

Too much work.

awk '{ print >> $1".txt" }' "$1"

Upvotes: 6

Can I make this bash script that writes to multiple files run any faster?

Answers (2)

Related Questions