Reputation: 1839
First thing - sorry for a bit misleading title, not sure how to describe this yet.
Basically, I have a list of keywords and I want to fetch the number of documents google returns per query. I have created the following awk script:
{
x = ""
for(i=1;i<=NF;i++) {
if(i==NF) {
x = x $i
} else {
x = x $i "+"
}
}
tab = "777" # id of an existing chrome tab as reported by 'chrome-cli list tabs'
system("chrome-cli open http://www.google.com/search?hl=en\\&q="x" -t " tab)
system("chrome-cli source -t " tab " | grep '<div id=\"resultStats\">About .* results<nobr>' | head -1 | sed -e 's/.*>About \(.*\) results<nobr>.*/\1/' | awk '{print $1\"\t"x"\"}' >> freq.log " );
system("cat freq.log" );
system("sleep 0.5");
}
What happens here is that I firstly replace all spaces with + signs, execute chrome-cli command to open chrome at that particular window, download source code and parse the number between "About" and "results" strings out and append the result to freq.log. This, however, outputs the following string into the file (for term alarm):
"})();</script><div alarm"
When I execute the same command from iOS terminal, I get a correct number (returns 127.000.000):
chrome-cli source -t 777 | grep '<div id="resultStats">About .* results<nobr>' | head -1 | sed -e 's/.*>About \(.*\) results<nobr>.*/\1/'
So my problem basically is that while everything works correctly from the terminal, as soon as I move my code to awk and execute it using a call to system, something breaks and regex doesn't work anymore.
Upvotes: 0
Views: 341
Reputation: 20889
You've properly escaped "
in your system
commands, but it looks like you haven't escaped the \
in your sed command. By the time it reaches sed, \(
is being seen as a plain (
.
Try changing your system
statements to print
and you'll see what I mean.
Worst case scenario, you can bundle the series of system
commands into a shell script and have awk call it instead... but in that case, you might as well entirely use shell scripting instead of awk.
Upvotes: 2