Greg
Greg

Reputation: 1389

Shell script - copy lines from file by key

I have two input files such that:

file1
123
456
789

file2
123|foo
456|bar
999|baz

I need to copy the lines from file2 whose keys are in file1, so the end result is:

file3
123|foo
456|bar

Right now, I'm using a shell script that loops through they key file and uses grep for each one:

grep "^${keys[$keyindex]}|" $datafile >&4

But as you can imagine, this is extremely slow. The key file (file1) has approximately 400,000 keys and the data file (file2) has about 750,000 rows. Is there a better way to do this?

Upvotes: 0

Views: 666

Answers (3)

glenn jackman
glenn jackman

Reputation: 246837

join is the best solution, if sorting is OK. An awk solution:

awk -F \| '
    FILENAME==ARGV[1] {key[$1];next} 
    $1 in key
' file1 file2

Upvotes: 0

gahooa
gahooa

Reputation: 137332

I would use something like Python, which would process it pretty fast if you used an optimized data type like set. Not sure of your exact requirements, so you would need to adjust accordingly.

#!/usr/bin/python

# Create a set to store all of the items in file1
Set1 = set()
for line in open('file1', 'r'):
   Set1.add(line.strip())

# Open a file to write to
file4 = open('file4', 'w')

# Loop over file2, and only write out the items found in Set1
for line in open('file2', 'r'):
   if '|' not in line: 
      continue

   parts = line.strip().split('|', 1)
   if parts[0] in Set1:
       file4.write(parts[1] + "\n")

Upvotes: 0

gpojd
gpojd

Reputation: 23065

You can try using join:

join -t'|' file1.txt file2.txt > file3.txt

Upvotes: 4

Related Questions