Reputation: 69
I have some files containing the following data:
160-68 160 68 B-A 0011 3.80247
160-68 160 68 B-A 0022 3.73454
160-69 160 69 B-A 0088 2.76641
160-69 160 69 B-A 0022 3.54446
160-69 160 69 B-A 0088 4.24609
160-69 160 69 B-A 0011 3.97644
160-69 160 69 B-A 0021 1.82292
I need to extract lines having any of values (that can be negative: ex -12222) in an array in the 5th column.
Output with [0088, 0021]:
160-69 160 69 B-A 0088 2.76641
160-69 160 69 B-A 0088 4.24609
160-69 160 69 B-A 0021 1.82292
I'm currently doing this with Ruby, but is there a way to do it faster with Bash?
Thanks.
Upvotes: 2
Views: 378
Reputation: 477
Another solution
#!/bin/bash
for i in "$@"
do
while read column
do
arr=(${column})
if [ ${arr[4]} = $i ]
then
echo $column
fi
done < input.txt
done
where input.txt is data file and you call this script as ./scriptname 0088 0021
Upvotes: -1
Reputation: 116987
Here's an egrep-based solution.
Suppose the array of special values is given as a simple CSV string, e.g.
A="0088,0021"
Then the following invocation of egrep will select the desired lines:
egrep "( [^ ]+){3} ($(tr , '|' <<< "$A")) "
In practice, it would probably be better to modify the regex above to make it less brittle with respect to the input format.
If the elements of the array ($A) contain characters that are special to egrep (such as square brackets, parentheses, etc.), then some care will be required to escape them. This can be done programatically, e.g.
A=$(sed 's/[]\.|$(){}?+*^]/\\&/g' <<< "$A")
See also the comment below.
Upvotes: 1
Reputation: 247210
bash is unlikely to be faster than ruby: bash is generally pretty slow. I'd pick awk or perl
awk -v values="0088 0021" '
BEGIN {
n = split(values, a)
for (i=1; i<=n; i++) b[a[i]]=1
}
$5 in b
' file
perl -ane 'BEGIN {%v = ("0088"=>1, "0021"=>1)} print if $v{$F[4]}' file
Upvotes: 4