Reputation: 1279
I'm using httpry to pull http packets from an interface on my machine. It outputs the results in a very clean format, with columns that are separated by either spaces or tabs. Here's a sample line from the output.
2012-11-27 20:29:22 192.168.1.132 74.125.224.51 > GET www.google.com / HTTP/1.1 - -
I'm trying to write a script (in either bash or python) that grabs the website, in this case www.google.com, and writes them to a file. Writing them to a file is easy enough, but I don't have any experience parsing based on white space or tabs. If anyone could get me pointed in the right direction on how to do this, that'd be great. Thanks for the help.
Upvotes: 1
Views: 1168
Reputation: 176
You can use "set --" in bash to split strings into words based on whitespace. Example:
echo "2012-11-27 20:29:22 192.168.1.132 74.125.224.51 > GET www.google.com / HTTP/1.1 - -" \
| while read line; do
set -- $line;
N=$#;
for ((i=0; i<N; i++)); do
echo "Field $i = '$1'";
shift;
done;
done
The output:
Field 0 = '2012-11-27'
Field 1 = '20:29:22'
Field 2 = '192.168.1.132'
Field 3 = '74.125.224.51'
Field 4 = '>'
Field 5 = 'GET'
Field 6 = 'www.google.com'
Field 7 = '/'
Field 8 = 'HTTP/1.1'
Field 9 = '-'
Field 10 = '-'
To extract only field 7, try
while read line; do set -- $line; echo "$7"; done
Upvotes: 1
Reputation: 309821
It seems to me that awk
is the tool for the job here (from within a bash script):
httpry -other -args -here | awk '{print $7}' > outfile.txt
Upvotes: 3
Reputation: 26778
In Python just use the split method for strings.
code
data = "2012-11-27 20:29:22 192.168.1.132 74.125.224.51 > GET www.google.com / HTTP/1.1 - -"
print data.split()
output
['2012-11-27', '20:29:22', '192.168.1.132', '74.125.224.51', '>', 'GET', 'www.google.com', '/', 'HTTP/1.1', '-', '-']
Upvotes: 3