Reputation: 8808
I'm looking for some pattern in 1.txt out of 2.txt, using "grep
". Then do some manipulation.
However, seems "grep
" is too slow for large text.
for (( i=1; i<=236410; i++ ))
do
head -$i 1.txt|tail -1|grep -f - 2.txt|awk '{mul+=$4*$7} END {print $1,$2,$3,mul}'
done > file1
I'm just wondering any alternative? Seems awk/sed
can do this, but just don't know how to pipe the variable head -$i 1.txt|tail -1
into awk or sed
thx
Upvotes: 0
Views: 1870
Reputation: 189739
Assuming your pattern file has 236,410 lines, and assuming grep
can handle that amount of input, and assuming the output file's order is not significant, why not just do
grep -f 1.txt 2.txt | awk ... >file1
If memory is an issue, and your input is static strings, try with fgrep
instead; it can handle a larger amount of patterns. If the order of the output is in fact significant, something like this should be a lot faster;
while read line; do
grep "$line" 2.txt | awk ...
done <1.txt >file1
Depending on the input, you may want to muck with IFS
and/or add some option to read
to handle whitespace, backslashes etc.
If you only want the 236,410 first lines of input, you can change this to
head -n 236410 1.txt |
while read line ...
If none of the above suit you, here's another idea. Since you are using awk
for the actual processing anyway, you might be able to refactor all of the processing into an awk
script, or create a sed
script on the fly and pass the output of that to awk
. This is a bit involved, and again depends on what your patterns look like, but something like this should give you an idea:
sed 's%.*%/&/p%' 1.txt | less
What you are looking at is a sed
script which prints if there's a match on each of the patterns in 1.txt
. (It will break if any pattern contains a forward slash. In the trivial case, use a different delimiter, or escape all slashes in the patterns.) Now you can save that to a file, or (if your sed
can handle a script on standard input) pass it to a second instance of sed
:
sed 's%.*%/&/p%' 1.txt | sed -f - -n 2.txt | less
And that is what you would pass to awk
:
sed 's%.*%/&/p%' 1.txt | sed -f - -n 2.txt | awk ... >file1
Upvotes: 1