Alex B.
Alex B.

Reputation: 69

Extract lines based on a column matching one of multiple values

I have some files containing the following data:

 160-68 160 68 B-A 0011 3.80247
 160-68 160 68 B-A 0022 3.73454
 160-69 160 69 B-A 0088 2.76641
 160-69 160 69 B-A 0022 3.54446
 160-69 160 69 B-A 0088 4.24609
 160-69 160 69 B-A 0011 3.97644
 160-69 160 69 B-A 0021 1.82292

I need to extract lines having any of values (that can be negative: ex -12222) in an array in the 5th column.

Output with [0088, 0021]:

160-69 160 69 B-A 0088 2.76641
160-69 160 69 B-A 0088 4.24609
160-69 160 69 B-A 0021 1.82292

I'm currently doing this with Ruby, but is there a way to do it faster with Bash?

Thanks.

Upvotes: 2

Views: 378

Answers (3)

Varun
Varun

Reputation: 477

Another solution

     #!/bin/bash
     for i in "$@"
         do 
         while read column
         do
            arr=(${column})
            if [ ${arr[4]} = $i ]
            then
                echo $column
            fi
         done < input.txt
    done

where input.txt is data file and you call this script as ./scriptname 0088 0021

Upvotes: -1

peak
peak

Reputation: 116987

Here's an egrep-based solution.

Suppose the array of special values is given as a simple CSV string, e.g.

A="0088,0021"

Then the following invocation of egrep will select the desired lines:

egrep "( [^ ]+){3} ($(tr , '|' <<< "$A")) "

In practice, it would probably be better to modify the regex above to make it less brittle with respect to the input format.

If the elements of the array ($A) contain characters that are special to egrep (such as square brackets, parentheses, etc.), then some care will be required to escape them. This can be done programatically, e.g.

A=$(sed 's/[]\.|$(){}?+*^]/\\&/g' <<< "$A")

See also the comment below.

Upvotes: 1

glenn jackman
glenn jackman

Reputation: 247210

bash is unlikely to be faster than ruby: bash is generally pretty slow. I'd pick awk or perl

awk -v values="0088 0021" '
    BEGIN {
        n = split(values, a)
        for (i=1; i<=n; i++) b[a[i]]=1
    }
    $5 in b
' file
perl -ane 'BEGIN {%v = ("0088"=>1, "0021"=>1)} print if $v{$F[4]}' file

Upvotes: 4

Related Questions