Reputation: 32081
I have an gawk script that has accumulated a bunch of HTML in a variable, and now should pipe it to lynx via a system command.
(feel free to tell me AWK is a bad solution... while read LINE;
was wildly bad (slow), so this is take 2)
I tried this in awk:
cmd = sprintf( "bash -c \'lynx -dump -force_html -stdin <<< \"%s\"\'", html )
system ( cmd )
Bad idea, although simple test cases work, with raw HTML, special character issues and string termination issues abound, and escapes-within-escapes-within-escapes is just getting mindbogglingly complex.
lynx handles well whatever I throw at it on stdin, I just can't get it to stdin from awk without piping it through the the command line, which seems like an unwieldy solution.
Edit (adding detail about my end goal) in case awk isn't a good approach:
What I want is to parse HTML out of a large text file with delimiters between blocks of html. I need to pass each block of HTML to lynx to be formatted and dump that into a new, big text file.
Example input (a dump from another system):
**********URL: http://some/url
<html>
<head><title>Any 'ol HTML document</title</head>
<body>
<p>With pretty much any character you can imagine at some point</p>
<p>I'm using lynx to strip off the HTML and give me a nice format</p>
</body>
</html>
**********URL: http://another/url
<html><head><title>My input file provides a few 100,000 such html documents</title></head>
<body/></html>
Each HTML document should be feed through lynx -dump
. Lynx can read in the HTML from file (e.g. named pipe, or file is an option), or stdin (with the -stdin option).
My output is then:
**********URL: http://some/url
Any 'ol HTML document
With pretty much any character you can imagine at some point
I'm using lynx to strip off the HTML and give me a nice format
**********URL: http://another/url
My input file provides a few 100,000 such html documents
Upvotes: 0
Views: 1306
Reputation: 32081
To add to n0741337's answer, here's an example using gawk coprocesses I did after reading his answer, it takes "aline" from stdin, and pipes it to a cat coprocess, and captures the output from the cat coprocess and prints it:
printf "aline" | awk '
BEGIN{cmd="cat"}
{
print $0 |& cmd;
close(cmd, "to");
while ((cmd |& getline line) > 0) {
print "got", line
};
close (cmd);
}'
result: got aline
The gawk manual has a more extensive discussion of this feature: http://www.gnu.org/software/gawk/manual/html_node/Two_002dway-I_002fO.html#Two_002dway-I_002fO
Upvotes: 0
Reputation: 2514
Try |&
in gawk., which I found out about from here. That would let you send the output from gawk to the stdin of another command as a coprocess.
Upvotes: 1