Reputation: 424
I have some text where I want to grab all the text up to the year, inclusive. I've tried starting with something like this
awk '/[1-2][0-9][0-9][0-9]/{print $1}'
but that only prints the first "word" of the input
"Financial summary 1997 FINAL.doc" => "Financial"
"v4 Minutes 19950705" => "v4"
What I want is "Financial summary 1997" and "v4 Minutes 1995". I've played around with $NF and various others with no success as yet I don't know how many words there are, or how many numbers there are, so I can't jut print $1 $2 $3. I don't have to use awk, but it would be useful since I am actually going to print the results out with some surrounding tags for output to an HTML file.
I can set a field separator to be "4 digits" but that discards the year
awk -F[1-2][0-9][0-9][0-9] '{print $1}
EDIT: This is what I ended up with and then came back to see a solution posted:
awk 'match($0,/.*(19|20)[0-9]{2}/){print substr($0,RSTART,RLENGTH)}'
Thanks all
Upvotes: 1
Views: 1543
Reputation: 1517
awk '{sub(/97/,"97\"");print $1,$2,$3}' file
"Financial summary 1997"
"v4 Minutes 19950705"
All we have to do is to print the three first fields and add a double quote after 1997. With the help of
sub the double is added but it has to be escaped first "\" that is all.
Upvotes: 0
Reputation: 784918
You can use this grep
command:
grep -oE '[^"]* [1-2][0-9]{3}' file
Financial summary 1997
v4 Minutes 1995
For awk
, you can use gensub
:
awk '{ print substr(gensub(/^(.* [1-2][0-9]{3}).*/, "\\1", "1"), 2) }' file
Financial summary 1997
v4 Minutes 1995
Upvotes: 2
Reputation: 92854
awk solution:
awk 'match($0,/.*\<[12][0-9]{3}/){ print substr($0,RSTART,RLENGTH)"\042" }' file
The output:
"Financial summary 1997"
"v4 Minutes 1995"
Upvotes: 1
Reputation: 18351
sed -r 's/(.*[1-2][0-9][0-9][0-9]).*/\1"/' input
"Financial summary 1997"
"v4 Minutes 1995"
Upvotes: 0