arieffirdaus
arieffirdaus

Reputation: 63

Combine two files and keep the non match lines

I am studying the sed command but I have a problem when combining 2 files.

file1.txt

A       1
C       3
E       5

file2.txt

1 John Lennon
2 Mariah carey
3 Cool & The Gang
4 Westlife
5 Red Hot Chili Peppers

desired output

1 John Lennon A
2 Mariah Carey
3 Cool & The Gang C
4 Westlife
5 Red Hot Chili Peppers E

I try to make awk script like this:

awk 'FNR==NR{seen[$1]=$2; next} $1 in seen{seen[$1]=seen[$1] OFS $2} END{ for (e in seen) print e, seen[e]}' file2.txt file1.txt | sort -V

but this output only display one word of the singer (John, Mariah, Cool, Westlife, and Red) and does not display the singer's full name. Is something wrong with my script?

Upvotes: 0

Views: 94

Answers (3)

Shawn
Shawn

Reputation: 52344

If the columns of the two files are separated by tabs instead of spaces (Looks like the first one is, second one I don't know; unfortunately SO's markdown is not tab friendly), it's a trivial join:

$ join -12 -21 -o 0,2.2,1.1 -t$'\t' -a2 <(sort -t$'\t' -k2,2 file1.txt) <(sort -t$'\t' -k1,1 file2.txt)
1   John Lennon A
2   Mariah carey    
3   Cool & The Gang C
4   Westlife    
5   Red Hot Chili Peppers   E

(join requires its files to be sorted lexicographically on the join column, not numerically, hence the sorts).

If there's just a space between the number and the band in file2, convert it to a tab first with sed:

join -12 -21 -o 0,2.2,1.1 -t$'\t' -a2 <(sort -t$'\t' -k2,2 file1.txt) <(sed 's/ /\t/' file2.txt | sort -t$'\t' -k1,1)

Upvotes: 1

anubhava
anubhava

Reputation: 784998

This can be done using a fairly simple 2 step process in awk and there is no need to use sort since we can process file2 in second phase:

awk 'FNR==NR{seen[$2]=$1; next} $1 in seen{$0 = $0 OFS seen[$1]} 1' file1 file2

1 John Lennon A
2 Mariah carey
3 Cool & The Gang C
4 Westlife
5 Red Hot Chili Peppers E

Upvotes: 2

RavinderSingh13
RavinderSingh13

Reputation: 133458

You should use following code, I have made minor changes in your attempt.

awk '
FNR==NR{
  val=$1
  $1=""
  sub(/^ +/,"")
  seen[val]=$0
  next
}
$2 in seen{
  print $2,seen[$2],$1
  b[$2]
  next
}
END{
  for(i in seen){
    if(!(i in b)){
      print i,seen[i]
    }
  }
}
' file2.txt file1.txt | sort -V

Output will be as follows.

1 John Lennon A
2 Mariah carey
3 Cool & The Gang C
4 Westlife
5 Red Hot Chili Peppers E

Problem with OP's attempted code:

  • OP is making index for seen array as $1(which is correct) BUT making value as ONLY $2 which is NOT right, because $2 will catch only John OR Mariah and so on.
  • That is the reason why OP's attempt is NOT giving complete output.
  • Also OP is using $1 index of Input_file1(file1.txt) to check if its present in seen array or not, it should be $2 there.

Upvotes: 1

Related Questions