LandonWO
LandonWO

Reputation: 1279

Parsing line for string based on whitespace/tabs with bash or python

I'm using httpry to pull http packets from an interface on my machine. It outputs the results in a very clean format, with columns that are separated by either spaces or tabs. Here's a sample line from the output.

2012-11-27 20:29:22     192.168.1.132   74.125.224.51   >       GET     www.google.com  /       HTTP/1.1        -       -

I'm trying to write a script (in either bash or python) that grabs the website, in this case www.google.com, and writes them to a file. Writing them to a file is easy enough, but I don't have any experience parsing based on white space or tabs. If anyone could get me pointed in the right direction on how to do this, that'd be great. Thanks for the help.

Upvotes: 1

Views: 1168

Answers (3)

Alex Tomlinson
Alex Tomlinson

Reputation: 176

You can use "set --" in bash to split strings into words based on whitespace. Example:

echo "2012-11-27 20:29:22     192.168.1.132   74.125.224.51   >       GET     www.google.com  /       HTTP/1.1        -       -" \
| while read line; do 
    set -- $line;
    N=$#;
    for ((i=0; i<N; i++)); do
        echo "Field $i = '$1'";
        shift;
    done;
  done

The output:

Field 0 = '2012-11-27'
Field 1 = '20:29:22'
Field 2 = '192.168.1.132'
Field 3 = '74.125.224.51'
Field 4 = '>'
Field 5 = 'GET'
Field 6 = 'www.google.com'
Field 7 = '/'
Field 8 = 'HTTP/1.1'
Field 9 = '-'
Field 10 = '-'

To extract only field 7, try

while read line; do set -- $line; echo "$7"; done

Upvotes: 1

mgilson
mgilson

Reputation: 309821

It seems to me that awk is the tool for the job here (from within a bash script):

httpry -other -args -here | awk '{print $7}' > outfile.txt

Upvotes: 3

Marwan Alsabbagh
Marwan Alsabbagh

Reputation: 26778

In Python just use the split method for strings.

code

data = "2012-11-27 20:29:22     192.168.1.132   74.125.224.51   >       GET     www.google.com  /       HTTP/1.1        -       -"
print data.split()

output

['2012-11-27', '20:29:22', '192.168.1.132', '74.125.224.51', '>', 'GET', 'www.google.com', '/', 'HTTP/1.1', '-', '-']

Upvotes: 3

Related Questions