Reputation: 1389
I have two input files such that:
file1
123
456
789
file2
123|foo
456|bar
999|baz
I need to copy the lines from file2 whose keys are in file1, so the end result is:
file3
123|foo
456|bar
Right now, I'm using a shell script that loops through they key file and uses grep for each one:
grep "^${keys[$keyindex]}|" $datafile >&4
But as you can imagine, this is extremely slow. The key file (file1) has approximately 400,000 keys and the data file (file2) has about 750,000 rows. Is there a better way to do this?
Upvotes: 0
Views: 666
Reputation: 246837
join
is the best solution, if sorting is OK. An awk solution:
awk -F \| '
FILENAME==ARGV[1] {key[$1];next}
$1 in key
' file1 file2
Upvotes: 0
Reputation: 137332
I would use something like Python, which would process it pretty fast if you used an optimized data type like set
. Not sure of your exact requirements, so you would need to adjust accordingly.
#!/usr/bin/python
# Create a set to store all of the items in file1
Set1 = set()
for line in open('file1', 'r'):
Set1.add(line.strip())
# Open a file to write to
file4 = open('file4', 'w')
# Loop over file2, and only write out the items found in Set1
for line in open('file2', 'r'):
if '|' not in line:
continue
parts = line.strip().split('|', 1)
if parts[0] in Set1:
file4.write(parts[1] + "\n")
Upvotes: 0