Reputation: 699
I have two files, which consists of two columns. In each file in the first column you have a MGD_number and in the second column you have SOL_number. In File1 you have MGD226-MGD450 In File2 you have MGD451-MGD675 I want to make a new file in which base on the same SOL_number from a different file you will fuse value of MGD from the one first from the second file (and put in $2) by using awk or in bash. For example MGD226 SOL17 in file1 and MGD456 SOL81 in file2 -> MGD226 MGD456 in file3 This is prohibited: MGD251 SOL17 in file1 and MGD378 SOL81 in file1 (the same file!) -> MGD251 MGD378 in file3
These are part of my two files
File1 input
MGD278 SOL51
MGD341 SOL51
MGD412 SOL78
MGD312 SOL78
File2 input
MGD489 SOL51
MGD572 SOL51
MGD613 SOL78
MGD684 SOL16
Expected output File3
MGD278 MGD489
MGD278 MGD572
MGD341 MGD489
MGD341 MGD572
MGD412 MGD613
MGD312 MGD613
Edit: I get an error on some data
I tried
join -j 2 file1 file2 | awk '{print $2, $3}' > file3
but its not working on some of my data, for example
file1
MGD226 SOL4454
MGD226 SOL9311
MGD226 SOL7369
MGD226 SOL1813
MGD227 SOL1696
file2
MGD451 SOL4198
MGD451 SOL1429
MGD452 SOL4803
MGD452 SOL1696
MGD452 SOL7369
expected output
MGD226 MGD452
MGD227 MGD452
I get an error
join: file2:4: not sorted: MGD452 SOL1696
join: file1:3: not sorted: MGD226 SOL7369
join: input is not sorted
I tried sort it like
sort -k 1 -n
or sort -k 2 -n
, but I get wrong output
Do you know what I do wrong?
Upvotes: 0
Views: 48
Reputation: 6073
When I run
join -j 2 file1 file2 | awk '{print $2, $3}' > file3
on the files file1
and file2
as above, I get
MGD278 MGD489
MGD278 MGD572
MGD341 MGD489
MGD341 MGD572
MGD412 MGD613
MGD312 MGD613
in file3
, which is what you want, assuming that your input files are sorted on the second field. If that is not the case, you will first have to sort these.
sort -k 2 file1 > file1.sorted
sort -k 2 file2 > file2.sorted
join -j 2 file1.sorted file2.sorted | awk '{print $2, $3}' > file3
rm file1.sorted file2.sorted
Upvotes: 0