slayedbylucifer
slayedbylucifer

Reputation: 23492

join/awk: Join 2 files on a column

I want to join below 2 files on 2nd column.

# cat vmid
10   win7-x64-17   [datastore1]   win7-kms-17/win7-template_1.vmx   windows7_64Guest   vmx-08
11   win7-x64-18   [datastore1]   win7-kms-18/win7-template_1.vmx   windows7_64Guest   vmx-08
4   win7-x64-11   [datastore1]   win7-kms-11/win7-template_1.vmx   windows7_64Guest   vmx-08
5   win7-x64-12   [datastore1]   win7-kms-12/win7-template_1.vmx   windows7_64Guest   vmx-08
6   win7-x64-13   [datastore1]   win7-kms-13/win7-template_1.vmx   windows7_64Guest   vmx-08
7   win7-x64-14   [datastore1]   win7-kms-14/win7-template_1.vmx   windows7_64Guest   vmx-08
8   win7-x64-15   [datastore1]   win7-kms-15/win7-template_1.vmx   windows7_64Guest   vmx-08
9   win7-x64-16   [datastore1]   win7-kms-16/win7-template_1.vmx   windows7_64Guest   vmx-08

# cat mac
00:0c:29:2f:00:d8   win7-kms-11
00:0c:29:db:f1:15   win7-kms-12
00:0c:29:ca:a3:d3   win7-kms-13
00:0c:29:f1:5e:ef   win7-kms-14
00:0c:29:1a:55:a6   win7-kms-15
00:0c:29:77:2e:93   win7-kms-16
00:0c:29:ae:5f:7f   win7-kms-17
00:0c:29:20:b4:aa   win7-kms-18

Expected Output (be it unsorted, no worries):

10  win7-x64-17     00:0c:29:ae:5f:7f
11  win7-x64-18     00:0c:29:20:b4:aa
4  win7-x64-11      00:0c:29:2f:00:d8
5  win7-x64-12      00:0c:29:db:f1:15
6  win7-x64-13      00:0c:29:ca:a3:d3
7  win7-x64-14      00:0c:29:f1:5e:ef
8  win7-x64-15      00:0c:29:1a:55:a6
9  win7-x64-16      00:0c:29:77:2e:93

And below are my 2 tries. Both return nothing:

# awk 'NR==FNR {a[$2]=$1;next } ($2 in a) {print $2" "a[$2]" "$1 }' vmid mac
# join -j 2 <(sort -k 2 vmid) <(sort -k 2 mac)

I know that my commands are working fine because when I try it on a test input, both above commands worked as expected as below:

# cat f1
82  d   
83  r
10  k
12  s

# cat f2
m   r
b   d
p   k
p   s

# join -j 2 <(sort -k 2 f1) <(sort -k 2 f2)
d 82  b
k 10 p
r 83 m
s 12 p

# awk 'NR==FNR {a[$2]=$1;next } ($2 in a) {print $2" "a[$2]" "$1 }' f1 f2 
r 83 m
d 82 b
k 10 p
s 12 p

Any pointers would be really helpful.

Thanks.

Upvotes: 2

Views: 142

Answers (3)

hek2mgl
hek2mgl

Reputation: 157947

With join you can use the following command:

join -1 4 -2 2 <(tr '/' ' ' <vmid | sort -k4) <(sort -k2 mac) | cut -d' ' -f1,2,8

Explanation:

join -1 4 -2 2 joins the input files based on column 4 of the first input file and column 2 of the second input file.

However, join expects the files to be sorted by that columns to work as expected. Also we need to separate win7-kms-1 from win7-kms-1/win7-template_1.vmx meaning replacing the / by a space in order to present it as a separate column to join. You might prepare the files regarding this before calling join but you also can use process substitution (as I showed).

At the end I'm using cut to select only the columns of interest.


So far to the join command. I wanted to explain that. An alternative to solve your use case would be to use awk as you already mentioned. Actually I would suggest that.

You was almost there, but it should be:

awk -F'[[:space:]/]+' 'NR==FNR{s[$4]=$1;next}$2 in s{print s[$2],$2,$1}' vmid mac

I'm using one or more space or / characters to delimit fields. This makes it simple to access win7-kms-1 in field $4.

NR == FNR is true as long as the total line number equals the line number in the current file. Actually this is only true as long as we read the first input file. We use that to create a lookup table from vmid and store field one based on field four.

$2 in s{print s[$2],$2,$1} checks then for the lines of mac if they appear in the lookup and prints the values of interest in that case.

Upvotes: 2

Sobrique
Sobrique

Reputation: 53478

How about something like this (in perl):

#!/usr/bin/env perl

use strict;
use warnings; 

my %mac_lookup;

while ( <> ) {
   my @fields = split; 
   if ( $fields[0] =~ m/^\d{2}:/ ) { 
       $mac_lookup{$fields[1]} = $fields[0]
   }
   else {
       my ( $key ) = $fields[3] =~ m,([\w\-]+)/,;
       print join "\t", @fields[0,1], $mac_lookup{$key},"\n";
   }
}

This will work when supplied your file names on command line or via stdin.

  • iterate line by line
  • if it starts with \d{2}: assume it's a mac->host pair, and insert into %mac_lookup
  • Otherwise assume it's a data line
  • extract key from the 4th field (arrays start at 0 in perl)
  • use that key to read from the hash.

Giving:

10  win7-x64-17 00:0c:29:ae:5f:7f   
11  win7-x64-18 00:0c:29:20:b4:aa   
4   win7-x64-11 00:0c:29:2f:00:d8   
5   win7-x64-12 00:0c:29:db:f1:15   
6   win7-x64-13 00:0c:29:ca:a3:d3   
7   win7-x64-14 00:0c:29:f1:5e:ef   
8   win7-x64-15 00:0c:29:1a:55:a6   
9   win7-x64-16 00:0c:29:77:2e:93

Note - order will match file order, not sorted.

This will collapse to a one liner if desired (although IMO that's generally not a good thing - you get hard to read results).

perl -lane 'if($F[0] =~ m/^\d{2}:/){$M{$F[1]}=$F[0]}else{print join "\t", @F[0,1], $M{$F[3]=~s|/.*||r},"\n"}'

Upvotes: 0

Guru
Guru

Reputation: 16974

Using awk(assuming the words are always kms and x64):

awk 'NR==FNR{sub("kms","x64",$2);a[$2]=$1;next}{print $1 ,$2,a[$2];}' mac vmid

Upvotes: 1

Related Questions