Reputation: 309
Running into a brainfart and am seeking guidance/tip. I have two files, I need to check the value of two fields, compare them, if they match, throw them into a third file:
File one:
tail -n 1 file_A
09/03/2013:11:55:49 S [email protected] [email protected] ThisPlace:Washington 0 09/03/2013:12:05:27 578
File two:
head -n 2 file_B
7187187187,"OfficeA"
9999999,"OfficeB"
Desired result:
more desired_result
09/03/2013:11:55:49 S 7187187187@OfficeA 9999999@OfficeB ThisPlace:Washington 0 09/03/2013:12:05:27 578
I thought about a shell script in a loop matching on each instance, but I am sure there is a method to match two fields a line using awk.
awk -F"@" 'NR==FNR{a[$1]=$2;next}{if ($2 in a)print a[$2]";"$0}' fileA fileB
Nope, I have tried a variety of diff ORS, FS, NR combinations to where I am stumped and am sure I am overlooking something
EDITED
@jotne
$ more fileA
08/22/2013:09:21:33 E [email protected] [email protected] EProv:EastProvidence_RI 1 08/22/2013:09:21:33 0
09/03/2013:05:09:58 S [email protected] [email protected] Tacoma:Washington 0 09/03/2013:13:29:31 29973
09/03/2013:10:46:19 S [email protected] [email protected] Boston:MA 0 09/03/2013:12:01:28 4509
09/03/2013:10:49:54 S [email protected] [email protected] Gaith:MD 0 09/03/2013:12:09:26 4772
09/03/2013:10:49:54 S [email protected] [email protected] Balt:MD 1 09/03/2013:12:09:26 4772
$ more fileB
3333444488,"Providence_Route"
5556666777,"Kenosha_Route"
9999988888,"Chitown_Route"
7778889999,"Chitown_Route"
Here is the gist of it. These are telephone numbers (CDR) I am trying to match up. The numbers I have listed in fileB and are structured as:
telnumber,"Which_Session_Border_Controller_Its_Routing_Through"
I am trying to say: Look for all these numbers in fileA: have a look in fileB, if you see a match on fileA's $3 or $4 substitute whatever comes after the @ sign for the name of the route.
While it may seem easier for me to just perl -pi -e 's:10.20.30.1:ChitownRoute:g' fileA
the addresses I used are sanitized and fluctuate so even attempting to fix those is a headache in itself. I would post more examples, but fileA is 1GB and fileB has about 44k lines
Upvotes: 1
Views: 66
Reputation: 41446
Here is an awk
version:
awk 'FNR==NR {split($1,f,"[,\"]");a[f[1]]=f[3];next} {for (i in a) for (j=1;j<=NF;j++) if ($j~i) $j=i"@"a[i]}1' fileB fileA
09/03/2013:11:55:49 S 7187187187@OfficeA 9999999@OfficeB ThisPlace:Washington 0 09/03/2013:12:05:27 578
This solution will loop trough all element in fileA, and test them against data in fileB.
More readable:
awk '
FNR==NR {
split($1,f,"[,\"]")
a[f[1]]=f[3]
next}
{
for (i in a)
for (j=1;j<=NF;j++)
if ($j~i)
$j=i"@"a[i]
}
1
' fileB fileA
With fileA
:
09/03/2013:11:55:49 S [email protected] [email protected] ThisPlace:Washington 0 09/03/2013:12:05:27 578
and fileB
7187187187,"OfficeA"
9999999,"OfficeB"
This gives:
09/03/2013:11:55:49 7187187187@OfficeA 9999999@OfficeB ThisPlace:Washington 0 09/03/2013:12:05:27 578
Result from new files:
08/22/2013:09:21:33 E [email protected] 3333444488@Providence_Route EProv:EastProvidence_RI 1 08/22/2013:09:21:33 0
09/03/2013:05:09:58 S 5556666777@Kenosha_Route [email protected] Tacoma:Washington 0 09/03/2013:13:29:31 29973
09/03/2013:10:46:19 S 3333444488@Providence_Route [email protected] Boston:MA 0 09/03/2013:12:01:28 4509
09/03/2013:10:49:54 S [email protected] 9999988888@Chitown_Route Gaith:MD 0 09/03/2013:12:09:26 4772
09/03/2013:10:49:54 S [email protected] 7778889999@Chitown_Route Balt:MD 1 09/03/2013:12:09:26 4772
Upvotes: 0
Reputation: 77085
Try this:
awk -F'[, ]' '
NR==FNR{
gsub(/\"/,"",$2);
a[$1]=$2;
next
}
{
split($3,t,/@/);
$3=t[1]"@"a[t[1]];
split($4,t,/@/);
$4=t[1]"@"a[t[1]]
}1' fileb filea
09/03/2013:11:55:49 S 7187187187@OfficeA 9999999@OfficeB ThisPlace:Washington 0 09/03/2013:12:05:27 578
Upvotes: 1