arnstrm
arnstrm

Reputation: 379

how to subtract fields pairwise in bash?

I have a large dataset that looks like this:

5 6 5 6 3 5
2 5 3 7 1 6
4 8 1 8 6 9
1 5 2 9 4 5

For every line, I want to subtract the first field from the second, third from fourth and so on deepening on the number of fields (always even). Then, I want to report those lines for which difference from all the pairs exceeds a certain limit (say 2). I should also be able to report next best lines i.e., lines in which one pairwise comparison fails to meet the limit, but all other pairs meet the limit.

from the above example, if I set a limit to 2 then, my output file should contain best lines:

2 5 3 7 1 6    # because (5-2), (7-3), (6-1) are all > 2
4 8 1 8 6 9    # because (8-4), (8-1), (9-6) are all > 2 

next best line(s)

1 5 2 9 4 5    # because except (5-4), both (5-1) and (9-2) are > 2

My current approach is to read every line, save each field as a variable, do subtraction. But I don't know how to proceed further.

Thanks,

Upvotes: 3

Views: 687

Answers (5)

plhn
plhn

Reputation: 5263

If you can use awk

$ cat del1
5 6 5 6 3 5
2 5 3 7 1 6
4 8 1 8 6 9
1 5 2 9 4 5
1 5 2 9 4 5 3 9

$ cat del1 | awk '{
> printf "%s _ ",$0; 
> for(i=1; i<=NF; i+=2){
>     printf "%d ",($(i+1)-$i)}; 
>     print NF 
> }' | awk '{
> upper=0; 
> for(i=1; i<=($NF/2); i++){ 
>     if($(NF-i)>threshold) upper++
> }; 
> printf "%d _ %s\n", upper, $0}' threshold=2 | sort -nr
3 _ 4 8 1 8 6 9 _ 4 7 3 6
3 _ 2 5 3 7 1 6 _ 3 4 5 6
3 _ 1 5 2 9 4 5 3 9 _ 4 7 1 6 8
2 _ 1 5 2 9 4 5 _ 4 7 1 6
0 _ 5 6 5 6 3 5 _ 1 1 2 6

You can process result further according to your needs. The result is sorted by ‘goodness’ order.

Upvotes: 0

F. Hauri  - Give Up GitHub
F. Hauri - Give Up GitHub

Reputation: 70792

Yet another bash version:

First a check function that return nothing but a result code:

function getLimit() {
    local pairs=0 count=0 limit=$1 wantdiff=$2
    shift 2
    while [ "$1" ] ;do
        [ $(( $2-$1 )) -ge $limit ] && : $((count++))
        : $((pairs++))
        shift 2
      done
    test $((pairs-count)) -eq $wantdiff
}

than now:

while read line ;do getLimit 2 0 $line && echo $line;done <file
2 5 3 7 1 6
4 8 1 8 6 9

and

while read line ;do getLimit 2 1 $line && echo $line;done <file
1 5 2 9 4 5

Upvotes: 1

doubleDown
doubleDown

Reputation: 8398

Prints "best" lines to the file "best", and prints "next best" lines to the file "nextbest"

awk '
{
        fail_count=0
        for (i=1; i<NF; i+=2){
                if ( ($(i+1) - $i) <= threshold )
                        fail_count++
        }
        if (fail_count == 0)
                print $0 > "best"
        else if (fail_count == 1)
                print $0 > "nextbest"
}
' threshold=2 inputfile

Pretty straightforward stuff.

  1. Loop through fields 2 at a time.
  2. If (next field - current field) does not exceed threshold, increment fail_count
  3. If that line's fail_count is zero, that means it belongs to "best" lines.

    Else if that line's fail_count is one, it belongs to "next best" lines.

Upvotes: 3

Rody Oldenhuis
Rody Oldenhuis

Reputation: 38032

Here's a bash-way to do it:

#!/bin/bash

threshold=$1
shift
file="$@"

a=($(cat "$file"))
b=$(( ${#a[@]}/$(cat "$file" | wc -l) ))

for ((r=0; r<${#a[@]}/b; r++)); do
    br=$((b*r))
    for ((c=0; c<b; c+=2)); do

        if [[ $(( ${a[br + c+1]} - ${a[br + c]} )) < $threshold ]]; then
            break; fi

        if [[ $((c+2)) == $b ]]; then
            echo ${a[@]:$br:$b}; fi

    done
done

Usage:

$ ./script.sh 2 yourFile.txt
2 5 3 7 1 6
4 8 1 8 6 9

This output can then easily be redirected:

$ ./script.sh 2 yourFile.txt > output.txt

NOTE: this does not work properly if you have those empty lines between each line...But I'm sure the above will get you well on your way.

Upvotes: 3

Tayacan
Tayacan

Reputation: 1826

I probably wouldn't do that in bash. Personally, I'd do it in Python, which is generally good for those small quick-and-dirty scripts.

If you have your data in a text file, you can read here about how to get that data into Python as a list of lines. Then you can use a for-loop to process each line:

threshold = 2
results = []
for line in content:
    numbers = [int(n) for n in line.split()] # Split it into a list of numbers
    pairs = zip(numbers[::2],numbers[1::2]) # Pair up the numbers two and two.
    result = [abs(y - x) for (x,y) in pairs] # Subtract the first number in each pair from the second.
    if sum(result) > threshold:
        results.append(numbers)

Upvotes: 1

Related Questions