Reputation: 411
I am using grep command to take the required information from a file . I am using two grep statements like the below
XXXX='grep XXXX FILE A|sort|uniq|wc -l'
grep YYYY FILE A|uniq| > FILE B
Now the file is being traversed twice . But I just want to know, if I will be able to do these two steps in a single file traversal i.e I want to know if I could use something similar to egrep where I can grep for two strings and one string I will use it for stroring in a variable and output of another string into a file.
Upvotes: 3
Views: 3000
Reputation: 212654
There is a trailing '|' symbol in your question, and perhaps you intended the YYYY lines to also be piped to sort
(or use sort -u
!), in which case you could simply do:
awk '/XXXX/ { if( !x[$0]++ ) xcount += 1 }
/YYYY/ { if( !y[$0]++ ) ycount += 1 }
END { print "XXXX:", xcount
print "YYYY:", ycount
for( i in y ) print i | "sort > FILEB"
}' FILE
this scans the file once, incrementing the counter whenever a uniq line containing the appropriate pattern is seen. Note that the order of the iteration over the array of YYYY lines is not well defined here, so the sort is necessary. Some versions of awk provide the ability to sort the array without relying on the external utility, but not all do. Use perl if you want to do that.
Upvotes: 0
Reputation: 4841
You can use the following code. Here we search for lines containing XXXX or YYYY in all file for only once and store the resulting lines to an array. Then we use elements of this array to select the lines containing XXXX and the lines containing YYYY.
filtered=`grep -E '(XXXX|YYYY)' FILE A`
XXXX=`for line in ${filtered[@]}; do echo $line; done | grep XXXX | sort | uniq | wc -l`
for line in ${filtered[@]}; do echo $line; done | grep YYYY | uniq > FILE B
So the file is not traversed twice!
Upvotes: 1
Reputation: 12572
Or use egrep with a disjunction:
egrep '(XXXX|YYYY)' FILE A | sort | uniq | ...
Or awk:
awk '/XXXX|YYYY/' FILE A | sort | uniq | ...
Upvotes: 0