pmdaly
pmdaly

Reputation: 1202

How to remove rows of one file whose first value does not appear in another file?

I have one file that has columns of integers -> File1

1 2 3
2 2 2
3 2 1
3 1 4
4 1 4
5 0 0

I have another file with a single column of (unique) integers -> File2

1
3
4

Both files are extremely large. I want to remove lines of File1 whose first column value doesn't appear in File2.

1 2 3
3 2 1
3 1 4
4 1 4

Upvotes: 0

Views: 56

Answers (1)

vrs
vrs

Reputation: 1982

You can do this looping through the lines of File1, extracting first numbers from each line and trying to find the exact match with the numbers in File2.

#!/bin/bash

IFS=$'\n'
re='^[0-9]+$'

for i in $( cat File1 ); do
    num=$( echo $i | awk '{print $1}' )
    grp=$( grep "^${num}\b" File2 )
    if [[ $grp =~ $re ]] ; then
        echo $i
    fi
done

You can save the output of this script in a temporary file and use it to overwrite original File1.

Upvotes: 1

Related Questions