Reputation: 1202

How to remove rows of one file whose first value does not appear in another file?

I have one file that has columns of integers -> File1

I have another file with a single column of (unique) integers -> File2

1
3
4

Both files are extremely large. I want to remove lines of File1 whose first column value doesn't appear in File2.

Upvotes: 0

Answers (1)

vrs

Reputation: 1982

You can do this looping through the lines of File1, extracting first numbers from each line and trying to find the exact match with the numbers in File2.

#!/bin/bash

IFS=$'\n'
re='^[0-9]+$'

for i in $( cat File1 ); do
    num=$( echo $i | awk '{print $1}' )
    grp=$( grep "^${num}\b" File2 )
    if [[ $grp =~ $re ]] ; then
        echo $i
    fi
done

You can save the output of this script in a temporary file and use it to overwrite original File1.

Upvotes: 1

How to remove rows of one file whose first value does not appear in another file?

Answers (1)

Related Questions