Reputation: 40758
Assume that we have command cmd
that only takes input through a pipe. Given a filename file.txt
, what is the most efficient way to pipe this into the command? (I assume cat file.txt | cmd
is not very efficient ..)
Upvotes: 0
Views: 6495
Reputation: 40758
Here are some other timing results
fn="file.txt"
[[ -e $fn ]] && rm "$fn"
for i in {1..100} ; do
echo "Line $i" >> "$fn"
done
arg='{print FNR, $0}'
N=1000
func1() {
for i in $(seq 1 $N) ; do
awk "$arg" "$fn" > "$temp_file"
done
}
func2() {
for i in $(seq 1 $N) ; do
cat "$fn" | awk "$arg" > "$temp_file"
done
}
func3() {
for i in $(seq 1 $N) ; do
while read line ; do
printf "%s\n" "$line"
done <"$fn" | awk "$arg" > "$temp_file"
done
}
func4() {
for i in $(seq 1 $N) ; do
while read line ; do
echo "$line"
done <"$fn" | awk "$arg" > "$temp_file"
done
}
func5() {
for i in $(seq 1 $N) ; do
readarray -t a <"$fn"
printf "%s\n" "${a[@]}" | awk "$arg" > "$temp_file"
done
}
func6() {
for i in $(seq 1 $N) ; do
awk "$arg" > "$temp_file" <"$fn"
done
}
time_it() {
temp_file="tmp_out$1.txt"
name="func$1"
{ time "$name"; } |& awk -vfn="$name" 'NR==2 {print fn, substr( $2, 3, length( $2) - 3 ) }'
}
for i in {1..6} ; do
time_it $i
done
The output for a single run on my Ubuntu laptop was:
func1 1.558
func2 2.273
func3 1.704
func4 1.427
func5 2.188
func6 1.576
Note that func1
is only used for comparison. It does not use piped input.. We see that for this particular run, func4
and func6
were approximately as fast as func1
..
Upvotes: 1
Reputation: 4681
Let's do a little test with a 1 GB blob (dump.data
):
Using the >
operator is much faster than piping from cat
:
$ time cat dump.data | cat >/dev/null
real 0m0.360s
user 0m0.000s
sys 0m0.608s
$ time cat <dump.data >/dev/null
real 0m0.158s
user 0m0.000s
sys 0m0.156s
The only way that should theoretically be a little faster than <
is if cmd
accepted a filename as its argument and read the file itself (because there is no IPC involved - only one process works with the data). It does however not make any difference in this test:
$ time cat dump.data >/dev/null
real 0m0.158s
user 0m0.000s
sys 0m0.156s
Upvotes: 4