Reputation: 179
I'm very new at awk and have been banging my head trying to get this to work. I'm trying to take a list of files in "image.list" and create an "info" file out of it. I need to grab the string matching a regex (a number 8-11 digits long) from the middle of the filename and print just that match into the designated spot in my "info file". That last part is the part I'm having trouble pulling off. Would love some help fixing that.
Here is my test file list:
SURGERY0001275678image1.jpg
SURGERY11134900211image2.jpg
SURGERY19257012image3.jpg
SURGERY273142590image4.jpg
Here is my current code:
awk 'BEGIN {print "-----TEST TAG FILE\tENCOUNTERS-----";}
> {print "FILE: /tmp/imagetest/"$1,"\t","ENCOUNTER: ",($1~/^[0-9]{8,11}$/);}
> END{print "END REPORT";
> }' image.list > upload.tag
And here is my current output:
-----TEST TAG FILE ENCOUNTERS-----
FILE: /tmp/imagetest/SURGERY0001275678image1.jpg ENCOUNTER: 0
FILE: /tmp/imagetest/SURGERY11134900211image2.jpg ENCOUNTER: 0
FILE: /tmp/imagetest/SURGERY19257012image3.jpg ENCOUNTER: 0
FILE: /tmp/imagetest/SURGERY273142590image4.jpg ENCOUNTER: 0
END REPORT
What i need it to display is the 8-11 digit number in the middle of the file name after "ENCOUNTER:". So far everything I've tried outputs either the whole filename or "0".
I'm probably way off course so I'd love to get some help from you experts!
Upvotes: 2
Views: 301
Reputation: 77095
Re-using your existing code:
$ awk '
BEGIN {
print "-----TEST TAG FILE\tENCOUNTERS-----";
}
match($0,/[^0-9]+([0-9]+)[^0-9]+/,ary) {
print "FILE: /tmp/imagetest/"$1,"\t","ENCOUNTER:"ary[1]
}
END {
print "END REPORT";
}' testfile
$ cat testfile
SURGERY0001275678image1.jpg
SURGERY11134900211image2.jpg
SURGERY19257012image3.jpg
SURGERY273142590image4.jpg
$ awk '
> BEGIN {
> print "-----TEST TAG FILE\tENCOUNTERS-----";
> }
> match($0,/[^0-9]+([0-9]+)[^0-9]+/,ary) {
> print "FILE: /tmp/imagetest/"$1,"\t","ENCOUNTER:"ary[1]
> }
> END {
> print "END REPORT";
> }' testfile
-----TEST TAG FILE ENCOUNTERS-----
FILE: /tmp/imagetest/SURGERY0001275678image1.jpg ENCOUNTER:0001275678
FILE: /tmp/imagetest/SURGERY11134900211image2.jpg ENCOUNTER:11134900211
FILE: /tmp/imagetest/SURGERY19257012image3.jpg ENCOUNTER:19257012
FILE: /tmp/imagetest/SURGERY273142590image4.jpg ENCOUNTER:273142590
END REPORT
As Ed Morton suggested in the comments, using array argument to match() this solution is GNU awk only.
Upvotes: 5
Reputation: 203368
Here's the commonly-written awk function "extract()" to extract a string that matches an RE:
awk -v re='<whatever>' '
function extract(str,regexp)
{ RMATCH = (match(str,regexp) ? substr(str,RSTART,RLENGTH) : "")
return RSTART
}
extract($0,re) { print RMATCH }
'
Just set "re" to whatever you want to match, e.g.:
$ cat file
SURGERY0001275678image1.jpg
SURGERY11134900211image2.jpg
SURGERY19257012image3.jpg
SURGERY273142590image4.jpg
$ awk -v re='[[:digit:]]{8,11}' '
function extract(str,regexp)
{ RMATCH = (match(str,regexp) ? substr(str,RSTART,RLENGTH) : "")
return RSTART
}
extract($0,re) { print RMATCH }
' file
0001275678
11134900211
19257012
273142590
or if you prefer a more specific solution using the same match()+substr() approach:
$ awk '
BEGIN{ print "-----TEST TAG FILE\tENCOUNTERS-----" }
{ printf "FILE: %s\tENCOUNTER: %d\n", $0, (match($0,/[[:digit:]]{8,11}/) ? substr($0,RSTART,RLENGTH) : 0) }
END{ print "END REPORT" }
' file
-----TEST TAG FILE ENCOUNTERS-----
FILE: SURGERY0001275678image1.jpg ENCOUNTER: 1275678
FILE: SURGERY11134900211image2.jpg ENCOUNTER: 11134900211
FILE: SURGERY19257012image3.jpg ENCOUNTER: 19257012
FILE: SURGERY273142590image4.jpg ENCOUNTER: 273142590
END REPORT
Note that if all of your file names follow the same pattern and don't have other digits before the stream of 8-11 digits you care about, you could just use [[:digit:]]+
as the matching RE instead of explicitly specifying the range [[:digit:]]{8,11}
if you like.
Upvotes: 2
Reputation: 3756
sed -r -e 's#(.*)#FILE:\t/tmp/imagetest/\1#;s/([0-9]*)(i[^i]*)$/\1\2\tENCOUNTER:\1/;1i -----TEST TAG FILE ENCOUNTERS-----' -e '$aEND REPORT' file
-----TEST TAG FILE ENCOUNTERS----- FILE: /tmp/imagetest/SURGERY0001275678image1.jpg ENCOUNTER:0001275678 FILE: /tmp/imagetest/SURGERY11134900211image2.jpg ENCOUNTER:11134900211 FILE: /tmp/imagetest/SURGERY19257012image3.jpg ENCOUNTER:19257012 FILE: /tmp/imagetest/SURGERY273142590image4.jpg ENCOUNTER:273142590 END REPORT
Upvotes: 3
Reputation: 208465
Try the following:
awk 'BEGIN {print "-----TEST TAG FILE\tENCOUNTERS-----";}
{print "FILE: /tmp/imagetest/"$1,"\t","ENCOUNTER: ",gensub(/[^0-9]*([0-9]*).*/, "\\1", 1, $1);}
END{print "END REPORT";
}' image.list > upload.tag
Upvotes: 0
Reputation: 3534
This
awk 'BEGIN {print "-----TEST TAG FILE\tENCOUNTERS-----";}
{printf "FILE: /tmp/imagetest/"$1"\tENCOUNTER: ";if($1~/[0-9]{8,11}/){sub(/
[0-9]+\.jpg$/,"",$1); gsub(/[a-zA-Z]/,"",$1);print $1}}
END{print "END REPORT";
}' image.list
will print
-----TEST TAG FILE ENCOUNTERS-----
FILE: /tmp/imagetest/SURGERY0001275678image1.jpg ENCOUNTER: 0001275678
FILE: /tmp/imagetest/SURGERY11134900211image2.jpg ENCOUNTER: 11134900211
FILE: /tmp/imagetest/SURGERY19257012image3.jpg ENCOUNTER: 19257012
FILE: /tmp/imagetest/SURGERY273142590image4.jpg ENCOUNTER: 273142590
END REPORT
Upvotes: 0
Reputation: 780899
awk '{encounter=$1; sub("^[^0-9]*([0-9]{8,11}).*", "\\1", encounter);
print "FILE: /tmp/imagetest/"$1,"\t","ENCOUNTER: ",encounter;}'
Upvotes: 0
Reputation: 45652
Try this:
$ cat input
SURGERY0001275678image1.jpg
SURGERY11134900211image2.jpg
SURGERY19257012image3.jpg
SURGERY273142590image4.jpg
$ awk '{split($1,a,/[[:alpha:]]*/);print a[2]}' input
0001275678
11134900211
19257012
273142590
Upvotes: 0