grep, awk or sed to Match and Combine Two Files

Question

I need to match one field from two files and then combine fields from both files.

File 1 will be huge and could have close to 12 million lines. File 2 will only have about 130 lines.

I tried -

egrep -f file1 file2

This only gives me the matching information that's containted in file2.

Is there a way to match information from both files and then combine portions from both files?

Also, I need something that's efficient since file1 is quite large.

All fields are pipe delimited.

Below is the contents of both files as well as the output file. I've included a key to each file.

Any guidance would be greatly appreciated.

File 1:

1234567890|1|6664|NA|KY|0|0|18
2345678901|22|1958|NA|MT|0|0|77
3456789012|3|991B|NA|EM|0|0|13
4567890123|3223|154E|NA|NY|0|0|52
5678901234|8|008F|NA|AZ|0|0|04
6789012345|27|D983|NA|IT|0|0|0

File 1 Key

Field 1: 10 digit number
Field 2: variable length numeric characters
Field 3: 4 alpha-numeric characters
Field 4: 2 alpha characters
Field 5: 2 alpha characters
Field 6: 1 digit number
Field 7: 1 digit number
Field 8: variable length numeric characters

File 2:

6664|Dr. Zhivago|Russian Literature|421
1958|Dr. Spock|Child Psychology|308
991B|Dr. Martin|Shoe Design|406
3223|Dr. House|Diagnostic Skills|440
008F|Dr. Evil|Global Political Economy|326
D983|Dr. Quincy|Processes of Science: Forensics|156

File 2 Key

Field 1: 4 alpha-numeric characters
Field 2: Variable length alpha-numeric characters
Field 3: Variable length alpha-numeric characters
Field 4: 3 digit number

Required Output:

1234567890|1|6664|Russian Literature
2345678901|22|1958|Child Psychology
3456789012|3|991B|Shoe Design
4567890123|3223|154E|Diagnostic Skills
5678901234|8|008F Global Political Economy
6789012345|27|D983|Processes of Science: Forensics

Output Key

From File 1 –

Field 1: 10 digit number
Field 2: variable length numeric characters
Field 3: 4 alpha-numeric characters

From File 2 –

Field 4: Variable length alpha-numeric characters

grep, awk or sed to Match and Combine Two Files

Answers (1)

Related Questions