Reputation: 151
The input is following
Title: Aoo Boo
Author: First Last
I am trying to output
Aoo Boo, First Last, "
by using awk like this
awk 'BEGIN { FS="[:[:space:]]+" }
/Title/ { sub(/^Title: /,""); t = $0; } # save title
/Author/{ sub(/^Author: /,""); printf "%s,%s,\"\n", t, $0}
' t.txt
But the output is like ,"irst Last. Basically it prints everything from the beginning of the sentence.
But if I change $0 to $2, the output is as expected which is Boo,Last,"
Why is it incorrect? What is the right way to do?
Upvotes: 1
Views: 1121
Reputation: 46856
This assumes there are no colons in titles or names...
awk -F': *' '
$1=="Title" {
sub(/[^[:print:]]/,"");
t=$2;
}
$1=="Author" {
sub(/[^[:print:]]/,"");
printf("%s, %s\n", t, $2);
}
' inputfile.txt
This works by finding the title and storing it in a variable, then finding the author and using that as a trigger to print everything according to your format. You can alter the format as you see fit.
It may break if there are extra colons on the line, as the colon is being used to split fields. It may also break if your input doesn't match your example.
Perhaps the most important thing in this example is the sub(...)
functions, which strip off non-printable characters like the carriage return that rici noticed you have. The regular expression [^[:print:]]
matches "printable" characters, which the carriage return is not. This script will substitute them into oblivion if they're there, but should do no harm if they are not.
Upvotes: 0
Reputation: 241791
You need to get rid of the Windows line endings in your text file if you want to use Unix utilities.
If you're lucky, you'll find you have the dos2unix
program installed, and you'll only need to do this:
dos2unix t.txt
If not, you could do it with tr
:
tr -d '\r' < t.txt > new_t.txt
For reference, what is going on is that Windows files have \r\n
at the end of every line (actually, a CR control code followed by a NL control code). On Linux, the lines ends with the \n
, so the \r
is part of the data; when you print it out, the terminal interprets as a "carriage return", which moves the cursor to the beginning of the current line, rather than advancing to the next line. Since the value of t
ends with a \r
, the following text overwrites the value of t
.
It works with $2
because you've reassigned FS
to include [:space:]
; that definition of field separators is more generous than the awk default, since it includes \r
and \f
, neither of which are default field separators. Consequently, $2
does not contain the \r
, but $0
does.
Upvotes: 3