Reputation: 1

String Manipulations in AWK

I am trying to match data from two files and create a new file with the results.

File 1 has data that looks like this:

19V17R1-wipedrive-2016.05.23-07.25PM-d0.pdf
19XPT32-wipedrive-2016.05.03-05.50AM-d0.pdf
19XPT32-wipedrive-2016.07.06-08.32PM-d0.pdf
1BC6062-wipedrive-2018.07.26-08.34AM-d0.pdf

File 2 just has the first 7 characters, like so:

19V17R1
1BC6062

The final file should look like this:

19V17R1 19V17R1-wipedrive-2016.05.23-07.25PM-d0.pdf
1BC6062 1BC6062-wipedrive-2018.07.26-08.34AM-d0.pdf

I can match the files by creating a file with just the first 7 characters and then doing:

awk 'FNR==NR{!a[$1]++;next}$0 in a' /RMAs.txt /sortedWipelogs.txt > matches.text

What I can't figure out is how to output the entire filename in the second column. Thanks.

Upvotes: 0

Answers (5)

kvantour

Reputation: 26471

There are many ways to do this. There is already a join answer. Here is a grep one:

$ grep -F -f file2 file1
19V17R1-wipedrive-2016.05.23-07.25PM-d0.pdf
1BC6062-wipedrive-2018.07.26-08.34AM-d0.pdf

But this could also match other parts of the file, but if you are certain of the format. This will do it. You also do not really need the first column, as they match! If you want the first column, you can do it simply like this

$ grep -F -f file2 file1 | awk '{print substr($0,1,7), $0 }'
19V17R1 19V17R1-wipedrive-2016.05.23-07.25PM-d0.pdf
1BC6062 1BC6062-wipedrive-2018.07.26-08.34AM-d0.pdf

or just

$ awk '(NR==FNR){a[$1];next}(substr($0,1,7) in a){ print substr($0,1,7), $0 }' file2 file1

or even shorter with - as a delimiter (only for file1 to avoid possible blank-problems in file2

$ awk '(NR==FNR){a[$1];next}($1 in a){ print $1, $0 }' file2 FS="-" file1

Upvotes: 0

stack0114106

Reputation: 8711

Using Perl

perl -lne ' BEGIN { $x=join("|", map{chomp;$_} qx(cat mweb2.txt)) } s/^($x)/$1 $1/g and print '

with the inputs

$ cat mweb1.txt
19V17R1-wipedrive-2016.05.23-07.25PM-d0.pdf
19XPT32-wipedrive-2016.05.03-05.50AM-d0.pdf
19XPT32-wipedrive-2016.07.06-08.32PM-d0.pdf
1BC6062-wipedrive-2018.07.26-08.34AM-d0.pdf

$ cat mweb2.txt
19V17R1
1BC6062

$ perl -lne ' BEGIN { $x=join("|", map{chomp;$_} qx(cat mweb2.txt)) } s/^($x)/$1 $1/g and print ' mweb1.txt
19V17R1 19V17R1-wipedrive-2016.05.23-07.25PM-d0.pdf
1BC6062 1BC6062-wipedrive-2018.07.26-08.34AM-d0.pdf

$

Upvotes: 0

karakfa

Reputation: 67467

if both of the files are sorted as shown, then simply

$ join -t- file1 file2

19V17R1-wipedrive-2016.05.23-07.25PM-d0.pdf
1BC6062-wipedrive-2018.07.26-08.34AM-d0.pdf

for the desired output format, this might be easier than setting -o options of join

$ join <(awk '{print substr($0,1,7) "\t" $0}' file1) file2

19V17R1 19V17R1-wipedrive-2016.05.23-07.25PM-d0.pdf
1BC6062 1BC6062-wipedrive-2018.07.26-08.34AM-d0.pdf

Upvotes: 1

RavinderSingh13

Reputation: 133458

Could you please try following.

awk 'FNR==NR{a[$0]=$0;next} a[$1]{print a[$1],$0}' Input_file2  FS="-" Input_file1

Explanation: Adding explanation for above code now.

awk '
FNR==NR{                  ##Checking condition FNR==NR which will be true when first Input_file named file2 is being read.
  a[$0]=$0                ##Creating an array named a whose index is $0 and value is $0.
  next                    ##Using next will skip all further statements from here.
}                         ##Closing block for FNR==NR here.
a[$1]{                    ##Checking condition if a[$1] is NOT NULL then do following.
  print a[$1],$0          ##Printing value of array a whose index is $1 of current lie, along with the current line.
}' file2  FS="-" file1    ##Closing block and mentioning Input_file file2 name then setting FS="-" and mentioning Input_file name file1 here.

Upvotes: 0

paxdiablo

Reputation: 881323

That's as simple as creating the following go.awk:

NR==FNR { lookup[substr($0,1,7)] = $0 }
NR!=FNR { print $0" "lookup[$0] }

Then you run it with:

awk -f go.awk file1.txt file2.txt

The first command is executed for each line in the first input file and it simply stores the entire line in an associative array, keyed on the first seven characters, for later lookup.

The second command, for each file in the second and subsequent input files, outputs the line and the related entry in the associative array. The output you see is exactly what you asked for:

19V17R1 19V17R1-wipedrive-2016.05.23-07.25PM-d0.pdf
1BC6062 1BC6062-wipedrive-2018.07.26-08.34AM-d0.pdf

Now I prefer using scripts since it means I don't have to go searching in my history for arbitrarily complex awk commands but, if you want a one-liner to do the same thing:

awk 'NR==FNR{lookup[substr($0,1,7)]=$0}NR!=FNR{print $0" "lookup[$0]}' file1.txt file2.txt

Upvotes: 0

String Manipulations in AWK

Answers (5)

Related Questions