Thomas
Thomas

Reputation: 1195

Extract HTML Form / Input Content with AWK

I want to extract to form content for processing.

What I already get from CURL is with mycurlcommand | grep "type=\"hidden":

<input type="hidden" name="var1" value="ABC">
<input type="hidden" name="var2" value="DEF">
<input type="hidden" name="var3" value="GHI">
<input type="hidden" name="var4" value="JKL">
<input type="hidden" name="var5" value="">

I want to get this:

var1=ABC
var2=DEF
var3=GHI
var4=JKL
var5=

to process and pass it again to CURL. I am be sure it is possible to do this in awk/cut/sed - other xml parsing tool are not available on my limited linux install (small storage).

Upvotes: 1

Views: 624

Answers (2)

Sundeep
Sundeep

Reputation: 23667

Since you mention xml parsing tool are not available, you can use these solutions. But, it may not work if the input pattern is different than the sample shown in the question. As a bonus, these solutions will eliminate the need for grep command mentioned in the question.

$ # use = or " characters as input field separator
$ # set = as output field separator
$ # print the required fields
$ awk -F'[="]' -v OFS='=' '/type="hidden"/{print $6, $9}' ip.txt
var1=ABC
var2=DEF
var3=GHI
var4=JKL
var5=

$ # this is useful when number of fields isn't fixed
$ # but the order has to be name followed by value
$ sed -nE '/type="hidden"/ s/.*name="([^"]*)".*value="([^"]*)".*/\1=\2/p' ip.txt
var1=ABC
var2=DEF
var3=GHI
var4=JKL
var5=

Upvotes: 4

RavinderSingh13
RavinderSingh13

Reputation: 133428

Could you please try following(since OP mentioned no other tools present for OP and guidance needed in awk or shell so going with this solution). I am passing Input_file to awk command if you are passing your_command output to awk then change following to like your_command | awk.....

awk '
match($0,/name="[^"]*/){
  val1=substr($0,RSTART,RLENGTH)
  match($0,/value="[^"]*/)
  val2=substr($0,RSTART,RLENGTH)
  sub(/.*"/,"",val1)
  sub(/.*"/,"",val2)
  print val1"="val2
  val1=val2=""
}'  Input_file

Explanation: Adding detailed explanation for above.

awk '                               ##Starting awk program from here.
match($0,/name="[^"]*/){            ##Using match to match from name=" till next " comes in current line.
  val1=substr($0,RSTART,RLENGTH)    ##Saving sub string of current line into val1 here.
  match($0,/value="[^"]*/)          ##Using match to match a regex from value=" till next occurance of " in current line.
  val2=substr($0,RSTART,RLENGTH)    ##Saving sub string into val2 which has previous match RSTART RLENGTH values.
  sub(/.*"/,"",val1)                ##Substituting everything till " in val1 here.
  sub(/.*"/,"",val2)                ##Substituting everything till " in val2 here.
  print val1"="val2                 ##Printing val1 = and val2 here.
  val1=val2=""                      ##Nullify val1 and val2 here.
}' Input_file                        ##Mentioning Input_file name here.

Upvotes: 3

Related Questions