Håkon Hægland
Håkon Hægland

Reputation: 40758

Efficient way to pipe file contents to program

Assume that we have command cmd that only takes input through a pipe. Given a filename file.txt, what is the most efficient way to pipe this into the command? (I assume cat file.txt | cmd is not very efficient ..)

Upvotes: 0

Views: 6495

Answers (2)

Håkon Hægland
Håkon Hægland

Reputation: 40758

Here are some other timing results

fn="file.txt"
[[ -e $fn ]] && rm "$fn"

for i in {1..100} ; do
    echo "Line $i" >> "$fn"
done
arg='{print FNR, $0}'
N=1000
func1() {
    for i in $(seq 1 $N) ; do
        awk "$arg" "$fn" > "$temp_file"
    done
}
func2() {
    for i in $(seq 1 $N) ; do
        cat "$fn" | awk "$arg" > "$temp_file" 
    done
}
func3() {
    for i in $(seq 1 $N) ; do
        while read line ; do
            printf "%s\n" "$line"
        done <"$fn" | awk "$arg" > "$temp_file"
    done
}
func4() {
    for i in $(seq 1 $N) ; do
        while read line ; do
            echo "$line"
        done <"$fn" | awk "$arg" > "$temp_file"
    done
}

func5() {
    for i in $(seq 1 $N) ; do
        readarray -t a <"$fn"
        printf "%s\n" "${a[@]}" | awk "$arg" > "$temp_file"
    done
}

func6() {
    for i in $(seq 1 $N) ; do
        awk "$arg" > "$temp_file" <"$fn"
    done
}

time_it() {
    temp_file="tmp_out$1.txt"
    name="func$1"
    { time "$name"; } |& awk -vfn="$name" 'NR==2 {print fn, substr( $2, 3, length( $2) - 3 ) }'
}

for i in {1..6} ; do
    time_it $i
done

The output for a single run on my Ubuntu laptop was:

func1 1.558
func2 2.273
func3 1.704
func4 1.427
func5 2.188
func6 1.576

Note that func1 is only used for comparison. It does not use piped input.. We see that for this particular run, func4 and func6 were approximately as fast as func1..

Upvotes: 1

Michael Jaros
Michael Jaros

Reputation: 4681

Let's do a little test with a 1 GB blob (dump.data):

Using the > operator is much faster than piping from cat:

$ time cat dump.data | cat >/dev/null

real    0m0.360s
user    0m0.000s
sys     0m0.608s

$ time cat <dump.data >/dev/null

real    0m0.158s
user    0m0.000s
sys     0m0.156s

The only way that should theoretically be a little faster than < is if cmd accepted a filename as its argument and read the file itself (because there is no IPC involved - only one process works with the data). It does however not make any difference in this test:

$ time cat dump.data >/dev/null

real    0m0.158s
user    0m0.000s
sys     0m0.156s

Upvotes: 4

Related Questions